develooper Front page | perl.perl5.porters | Postings from March 2001

use bytes; - what does/should it mean?

Thread Next
From:
Nick Ing-Simmons
Date:
March 12, 2001 04:57
Subject:
use bytes; - what does/should it mean?
Message ID:
200103121256.MAA24666@mikado.tiuk.ti.com

The Camel-III says that "use bytes" makes perl5.6+ behave like 
perl5.005_03 and adding it to legacy byte-processing scripts
makes them safe from new character semantics. 

That isn't what it does right now.

What it does now though is expose whatever the internal representation 
is at the time an op accesses the SvPV. Which makes scripts less safe.
(It is also particularly weird on EBCDIC - which is why I was looking 
at it.)

The Camel-III has other vague descriptions of 'use bytes' which lead 
us to what we have now, but my impression overall is that the 
"like perl5.005_03" is what was really meant.

My interpretation of "like perl5.005_03" is that there should be 
no characters with values > 255, as perl5.005_03 would never have 
had them, so that right thing to do is truncate characters to 0..255
(with an optional lexical warning).

However it is quite easy to make things safe for a large majority of cases
by changing the much used DO_UTF8 macro to this:

#define DO_UTF8(sv) (SvUTF8(sv) && !(IN_BYTE && sv_utf8_downgrade(sv,0)))

and then tweaking sv_utf8_downgrade() to be non-fatal and just make
all chars ord(ch) % 256 in "use bytes" mode.

If I do that on the bleadperl-ish copy I have to hand I get these fails:

Failed Test    Status Wstat Total Fail  Failed  List of Failed
--------------------------------------------------------------------------------
comp/require.t	              23    1   4.35%  21
io/utf8.t     	              25    5  20.00%  16, 19, 22-24
op/concat.t   	              11    3  27.27%  4-6
op/each.t     	              26    1   3.85%  26
op/length.t   	              13    4  30.77%  7, 9, 11, 13
op/ver.t      	              28    3  10.71%  19, 21, 23
pragma/utf8.t 	              15   12  80.00%  3-14
2 tests and 96 subtests skipped.
Failed 7/282 test scripts, 97.52% okay. 29/21949 subtests failed, 99.87% okay.

It would be easy enough to "fix" the t/*/*.t files to pass in such cases.

(Possibly with the use of for-testing-peer-at-internals assist from Encode.xs
in one or two cases.)

We have asked Larry directly but had no reply yet.

So at the risk of opening the flood-gates - what does p5p think 
use bytes should do? 

A. Be a belt-and-braces saftey cross-check for old byte processing 
   scripts as per Camel?

B. Be perl-code way of peering into the internals?

-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About