develooper Front page | perl.perl5.porters | Postings from May 2010

Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w

Thread Previous | Thread Next
karl williamson
May 19, 2010 11:20
Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w
Message ID:
Eric Brine wrote:
> On Tue, May 18, 2010 at 4:04 PM, demerphq < 
> <>> wrote:
>     On 18 May 2010 21:30, Jesse Vincent <
>     <>> wrote:
>      > Neither of those arguments suggest that this is a case when we should
>      > break backward compatibility gratuitously.
>     We shall have to agree to disagree that this is gratuitous breakage. I
>     think thats going far too far.
>      > We may well be unable to do this cleanly and sanely. But that's a
>     very
>      > different sort of argument.
>     Basically we only have to worry about 'l' because of 'le', and 'f'
>     because of 'if'. Any others?
> Not "if". it's already a syntax error because "i" is a valid option.
> Any of the following immediately following the delimiter are currently 
> valid, but will become a syntax error (e.g. /foo/le+1) or different 
> valid code (e.g. /foo/lt+1):
>     * unless & until from /u
>     * le & lt from /l
>     * [none] from /t
> We're precluded from using these:
>     * /a (and)
>     * /f (for, foreach)
>     * /n (ne)
>     * /w (when, while)

I don't understand these preclusions.  Why, for example, does the 
existence of 'and' preclude /a, but the existence of 'unless' not 
preclude /u ?  FYI: Yves' original proposal was for an additional /a 
modifier to restrict the range of \s and \d to ASCII.  That code 
remains to be written.

How about this alternate solution:

Instead of creating a syntax error, we deprecate in 5.14 not inserting a 
space between a pattern terminator and the following word.

If one of the new modifiers in conjunction with other legal modifiers 
matches one of those legal words, we take the old behavior.  The 
documentation will caution people writing new code to not do that, 
listing all the possibilities.

And, I only see two such possibilities: 'le' and 'lt'.  All the rest 
listed above require a modifier that doesn't exist.   Someone is 
unlikely to use 'lt' anyway since the 't' just overrides the 'l'.  I 
don't think it's too much of a burden for someone writing new code to 
not use 's/foo/bar/le' when the documentation warns against it.

I'm also unsure that 't' is the best modifier.  'traditional' was 
suggested as an alternative to the preferred 'legacy', since everyone 
agreed that 'l' should stand for 'locale'.  'h' could stand for heritage 
or historical, or 'r' for retro, or v for vintage.  or even 'vanilla'.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About