develooper Front page | perl.perl5.porters | Postings from December 2008

RFC: Consolidated proposal for octals like \400 in strings. Was:PATCH [perl #59342] chr(0400)

Thread Next
From:
karl williamson
Date:
December 5, 2008 16:08
Subject:
RFC: Consolidated proposal for octals like \400 in strings. Was:PATCH [perl #59342] chr(0400)
Message ID:
4939C264.20104@khwilliamson.com
I've done some more research and thought about this, and have come up 
with the enclosed straw proposa1.  I hope I haven't shortchanged 
anyone's previous ideas.

I now know enough about perl internals that I could implement all of 
this as-is (or with modifications) in short order.

1.  There shall be a new pragma "legacy"

Things like "use legacy 'octals'" can be used for various behaviors we 
change but that we want to allow the old way of doing things to still be 
possible.  The list of legacy operations can be expanded in the future 
as necessary.

2.  A new syntax \o{...} will be created for octal constants in regular 
expressions, so that a writer may choose to avoid the existing ambiguities.

3. This syntax will be also accepted in any string constant, for 
consistency.

4.  In 5.12, the maximum octal constant accepted as part of strings (as 
opposed to numbers) not using the above syntax will be \377 unless the 
writer uses the new legacy pragma to override it.  A writer in 5.10 can 
use "no legacy" to get this behavior earlier.

5.  In 5.10, the bug with octals in regular expressions above \377 not 
doing at all the expected thing will be changed to generate an error. 
Since it doesn't work right, we don't have to worry about breaking 
existing code.

6.  The maximum octal constant using \o{...} will not be limited. 
Numbers over \377 will be treated as corresponding Unicode code points. 
  (I don't see a reason not to allow this.)

6.  In 5.10.1 an #if will be inserted into perl so that it won't compile 
unless UCHAR_MAX is 255.

UCHAR_MAX is in the standard limits.h header file.  It gives the largest 
unsigned char that the installation can handle.

The C language requires UCHAR_MAX to be at least 255.  In reading the 
code (and I haven't read that much of it), I have found several places 
where it likely doesn't work if UCHAR_MAX is greater than 255.  There 
are several left shifts where the bits are assumed to vanish above the 
8th, and several cases of arrays of size 256 which use an index of an 
unsigned char.

I have a friend who has been on the ISO C committee for a long time. 
(BTW, anybody can join the committee if they're willing to pay the steep 
annual fee.)  He told me that companies that tried making larger 
character sizes retracted that because of portability problems of data 
files written by them.  The only non-DSP one that he knows of still in 
use is the Unisys 2200 mainframe, which has 9-bit chars in a 36 bit 
word.  (He thinks the DSP ones have 16 bit address space, so perl 
couldn't run on them.)

By doing this now, we could be certain that no one wanted perl on an 
architecture that someone cared to have an octal above \377, and that 
probably doesn't work well anyway on some constructs.  I don't know if 
these would show up in our tests or not.

If someone complained, and their perl had been mostly working, we could 
then remove the #if, and work to fix the bugs, and change our minds 
about what to do about octals.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About