Front page | perl.perl5.porters |
Postings from October 2008
Re: perl Unicode bug
From: Juerd Waalboer
October 28, 2008 17:52
Re: perl Unicode bug
Message ID: 20081029005238.GY17634@c4.convolution.nl
karl williamson skribis 2008-10-28 12:51 (-0600):
> You haven't been responding to various messages on perl porters about
> this problem that I'm working to fix. I'm wondering if you have been
> reading it.
Haven't been reading it, because I'm very busy. I'm happy to answer
individual questions as long as they don't require much investigation.
> I know you don't like a pragma solution for enabling the behavior, but
> that seems to be the consensus.
Argh. I have no energy to participate in this repetative discussion
> Given that, do you have a name for it you like?
Well, a pragma has been discussed for backwards compatibility, assuming
that in 5.12 the new behavior would be made default. Since ascii and
unicode semantics are mutually exclusive, there can be only one pragma;
the inverse will have the opposite meaning.
It's a weird situation that we still act as if unicode is something
special. Unicode strings are made the norm, rather than the exception.
This makes ASCII semantics the exception, especially when we consider
the future where unicode semantics are *default*.
So in a probably unexpected move, I suggest using an "ascii" pragma, if
there really has to be a pragma, that is. All suggestions so far have
attempted to find a good name from a prehistoric "unicode is special"
perspective. I like to think: "unicode is standard, ascii is special",
a somewhat more modern approach.
Logically, there are two modes of operation. One is ascii semantics, the
other is normal semantics (that is: unicode). But alas, we have a third
mode that is currently the only one, which relies on faulty heuristics.
For backwards compatibility it may be interesting to retain these;
adding a single line to code relying on the broken semantics would be an
instant fix. Especially with a lexical pragma, this feels like having a
So I suggest:
no ascii; # unicode semantics for all strings; default in 5.12
use ascii qw(guess); # current default, name up for discussion
use ascii; # force ascii semantics for a specific documented set of
I expect that "use ascii;" can be very useful in, for example, simple
system administration automation scripts. "no ascii;" is a good way to
say "I don't want your old stuff", and "use ascii qw(guess);"
is a nice visual hint that you're deliberately opting for the less
predictable route. In some ways, "use bytes" does this already. But
"use bytes" and "bytes::" are a bad idea in my strong opinion. I'd like
to see "use bytes" deprecated.
I wish to remain ignorant about EBCDIC in Perl.
Met vriendelijke groet, Kind regards, Korajn salutojn,
Juerd Waalboer: Perl hacker <#####@juerd.nl> <http://juerd.nl/sig>
Convolution: ICT solutions and consultancy <email@example.com>