I've done some more research and thought about this, and have come up
with the enclosed straw proposa1. I hope I haven't shortchanged
anyone's previous ideas.
I now know enough about perl internals that I could implement all of
this as-is (or with modifications) in short order.
1. There shall be a new pragma "legacy"
Things like "use legacy 'octals'" can be used for various behaviors we
change but that we want to allow the old way of doing things to still be
possible. The list of legacy operations can be expanded in the future
as necessary.
2. A new syntax \o{...} will be created for octal constants in regular
expressions, so that a writer may choose to avoid the existing ambiguities.
3. This syntax will be also accepted in any string constant, for
consistency.
4. In 5.12, the maximum octal constant accepted as part of strings (as
opposed to numbers) not using the above syntax will be \377 unless the
writer uses the new legacy pragma to override it. A writer in 5.10 can
use "no legacy" to get this behavior earlier.
5. In 5.10, the bug with octals in regular expressions above \377 not
doing at all the expected thing will be changed to generate an error.
Since it doesn't work right, we don't have to worry about breaking
existing code.
6. The maximum octal constant using \o{...} will not be limited.
Numbers over \377 will be treated as corresponding Unicode code points.
(I don't see a reason not to allow this.)
6. In 5.10.1 an #if will be inserted into perl so that it won't compile
unless UCHAR_MAX is 255.
UCHAR_MAX is in the standard limits.h header file. It gives the largest
unsigned char that the installation can handle.
The C language requires UCHAR_MAX to be at least 255. In reading the
code (and I haven't read that much of it), I have found several places
where it likely doesn't work if UCHAR_MAX is greater than 255. There
are several left shifts where the bits are assumed to vanish above the
8th, and several cases of arrays of size 256 which use an index of an
unsigned char.
I have a friend who has been on the ISO C committee for a long time.
(BTW, anybody can join the committee if they're willing to pay the steep
annual fee.) He told me that companies that tried making larger
character sizes retracted that because of portability problems of data
files written by them. The only non-DSP one that he knows of still in
use is the Unisys 2200 mainframe, which has 9-bit chars in a 36 bit
word. (He thinks the DSP ones have 16 bit address space, so perl
couldn't run on them.)
By doing this now, we could be certain that no one wanted perl on an
architecture that someone cared to have an octal above \377, and that
probably doesn't work well anyway on some constructs. I don't know if
these would show up in our tests or not.
If someone complained, and their perl had been mostly working, we could
then remove the #if, and work to fix the bugs, and change our minds
about what to do about octals.
Thread Next