develooper Front page | perl.perl5.porters | Postings from November 2014

Re: [perl #122853] Guarantee 0-9, A-Z, a-z character classes

Thread Previous | Thread Next
Aristotle Pagaltzis
November 1, 2014 09:01
Re: [perl #122853] Guarantee 0-9, A-Z, a-z character classes
Message ID:
* Father Chrysostomos via RT <> [2014-10-30 12:25]:
> On Thu Oct 30 01:25:13 2014, aristotle wrote:
> > To me the principle behind this deprecation is not “this would not
> > port to EBCDIC so you should not be doing this” but “we are making
> > \x and \N mean different things that cannot semantically be mixed”.
> But on ASCII systems character ranges are simple (start at the Unicode
> codepoint specified by the left-hand character and iterate through
> them to the right-hand character). I don’t think making them more
> complex brings any benefit. On EBCDIC, due to the model that Perl
> follows, they are naturally complex, but that complexity needn’t
> affect code and programmers that never come in contact with EBCDIC.

How are they naturally complex? They are not any different in principle
in EBCDIC than in ASCII and so don’t have to be any more complex. The
complexity with EBCDIC is a choice, made in the design of Perl, out of
the desire to preserve (some of!) the meaning of programs written under
assumptions based on ASCII.

And here, the proposed solution (which seems the only sensible one too)
is that if you use two \x{}s, then \x{} means one thing, but if you use
one \x{} and \N{} then \x{} means another thing. On ASCII platforms that
is a distinction without a difference, but on EBCDIC platforms it’s not.


I wonder if there’s a case for just allowing such mixed ranges on ASCII
systems but warning about them on EBCDIC systems?

That way, that group of users who are possibly affected at least get
a chance to notice, and can patch the code if they own it or else ask
for a patch if e.g. they got it from CPAN.

OTOH, if \x{} in mixed ranges is a synonym for \N{U+}, then in 99.9% of
case the response will be to replace the \x{} with a \N{U+} because
that’s what it did before, so nothing about the program actually changes
and so it ultimately is a pointless make-the-user-say-it-right warning.

So then we’re left with a lone \x{} meaning something distinct from an
\x{} partnered with another \x{}, which I can’t bring myself to like –
even though it will admittedly be a distinction without a difference for
all but a tiny minority of users.

Aristotle Pagaltzis // <>

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About