develooper Front page | perl.perl5.porters | Postings from December 2009

Re: PATCH: partial [perl #58182]: regex case-sensitive matching now utf8ness independent

Thread Previous | Thread Next
From:
demerphq
Date:
December 10, 2009 04:23
Subject:
Re: PATCH: partial [perl #58182]: regex case-sensitive matching now utf8ness independent
Message ID:
9b18b3110912100423x2cc1845aoae187ab5e4d9417e@mail.gmail.com
2009/12/9 Juerd Waalboer <juerd@convolution.nl>:
> karl williamson skribis 2009-12-09 12:11 (-0700):
>> Since Yves is incommunicado, I took what he had done before Larry's veto
>> and extended and modified it, adding an intermediate way.  What that
>> means is that anything that looks like[[:xxx:]] will match only in the
>> ASCII range, or in the current locale, if set.  I never heard any
>> controversy about that part of the proposal, and it makes sense to me
>> that a Posix construct should act like the Posix definition says to.
>
> These "posix" constructs have for a long time been documented as
> *equivalent* to \d, \s and \w, with two remarks: [[:space:]] also
> includes \cK and [[:word:]] doesn't even exist in POSIX.

*mis*documented.

And, [[:word:]] is spelled [[:alnum:]].

>
> Changing them is as bad as changing the metacharacters. Changing them to
> break the equivalency might even be worse.

I very very very much doubt it, and consider this to be essentially FUD.

Especially as it fixes a stack of bugs related to their behaviour now.

You cannot have both the current behaviour and non buggy implementation.

Simply put I consider that:

[^STUFF] matching the same code points as [STUFF] to be an irrefutable
and overwhelming reason why the current behavior of POSIX charclass
cannot be preserved.

Essentially for me this bug ends ANY debate on this particular issue.

Had we known of this violation of the rules we NEVER would have
allowed this to escape in the wild.

> Also, note that perlre calls this "POSIX character class **syntax**"
> (emphasis mine).
>
> An even stronger argument is that perlre defines equivalence with
> \p{...}, and explicitly mentions that these are Unicode constructs.

*mis*documented as equivalent.

At least one of the "equivalencies" was *never* true, and the other
equivalencies were by breaking unicode rules to be more perl like.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About