develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
April 30, 2011 15:00
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
20110430220028.GW23881@plum.flirble.org
On Sat, Apr 30, 2011 at 03:31:23PM -0600, Tom Christiansen wrote:
> Karl Williamson <public@khwilliamson.com> wrote
>    on Sat, 30 Apr 2011 15:20:08 MDT: 
> 
> > In thinking about this some more, given the bug that Nicholas found that 
> > affects all multi-character folds, not just \xdf,  in character classes, 
                                                       ^^^^^^^^^^^^^^^^^^^^

> > I think it would be best to just not offer any of them in 5.14.
> 
> You mean undo something that's been there since 5.8?
> 
>     % perl5.8.0 -le 'print "\x{1FB2}" =~ /\x{1FB2}/i || 0'
>     1
>     % perl5.8.0 -le 'print ucfirst("\x{1FB2}") =~ /\x{1FB2}/i || 0'
>     1
>     % perl5.8.0 -le 'print uc("\x{1FB2}") =~ /\x{1FB2}/i || 0'
>     1

> Or did you mean something else?

I think you missed the "in character classes" part of Karl's thought.
Your examples don't use []


I'm still not sure *what* I think.

But *if* a class consisting of a single character is always equivalent to a
literal of that character (ie /[a]/ is /a/, /[ß]/ is /ß/, /[ß]/i is /ß/i,
etc), one of the things I'm not about is whether it's better to say "no
multi character folds in character classes" or "no multi character folds in
character classes, except classes consisting of exactly one character". I
think (I think) that it's useful to maintain that explicit correspondence,
as (IIRC) Yves worked to get the engine to optimise /[a]/ to /a/ and /[.]/ to
/\./, as it was a common idiom in some circles to use regexp character class
syntax as an alternative to backslash quoting.

The downside, obviously, is that (for starters) it's more complex to explain.


Digression:

Because as a general rule, rightly or wrongly on my part, I feel that it's
unfortunate if two or more different syntax choices for the same action
produce notably different performance because they trigger different
runtime implementations, where both

a: one is unambiguously always slower than the other
b: it would be possible for the compile time implementation to automatically
   select the faster implementation, whichever syntax was used


because that way

a: all existing code goes faster without change
b: it kills dead style arguments based on "but this one is more efficient"
   letting people pick style based on clarity (or their opinions of clarity)


(eg reverse sort ...; is now internally optimised to tell sort to sort in
reverse, so no slower than sort {$b cmp $a} ...; but usually somewhat clearer)


Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About