develooper Front page | perl.perl5.porters | Postings from October 2009

Re: What should \s \w \d match in 5.12?

Thread Previous | Thread Next
From:
karl williamson
Date:
October 3, 2009 18:35
Subject:
Re: What should \s \w \d match in 5.12?
Message ID:
4AC7FB7A.80700@khwilliamson.com
Tatsuhiko Miyagawa wrote:
> I was looking at perl5110delta and surprised (and a bit upset) to see
> the \d \w \s changes mentioned:
> 
> I toyed with a small piece of code and seems it's not working as
> specified in delta anyway:
> http://gist.github.com/200900
> 
> So apparently the delta is not correct, or delta is trying to specify
> what *will* be changed but not done yet?
> 

Yes, the delta is not correct, but gives the current plan, so that 
should be what happens.

> Anyway, I have tons of scripts that rely on \d matching Japanese
> numbers and \s matches with full-width space etc. Being able to have a
> pragma to enable/disable the new behavior would be very nice. (I
> understand I can start rewriting those \d to like \p{IsDigit} to be
> forward compatbile, though)
> 

Note that the 'Is' is optional.  The chart in the delta gives the 
mappings for \s and \w as well.  Note that if you can accept a vertical 
tab in \s, that \p{Space} is shorter.

There are plans for a pragma for other unicode incompatibilities, and a 
git branch that includes the beginnings of one: "use legacy".  I had 
thought that these changes could be controlled by a pragma, and I hope 
that it is this one.


> On Thu, Oct 1, 2009 at 6:09 PM, karl williamson <public@khwilliamson.com> wrote:
>> demerphq wrote:
>>> 2009/9/30 karl williamson <public@khwilliamson.com>:
>>>> I had thought in our discussion last year that we had determined that
>>>> these
>>>> should match only in the ASCII range.  And so, I thought that when Yves
>>>> flipped the switch on the \p{Posix} matches, that these would change as
>>>> well, but that isn't the case:
>>>>  perl -E "say chr(0x2028) =~ /\s/"
>>>> 1
>>>>
>>>> in blead.
>>> Im inclined to say it just slipped me by. Ill poke it with a stick
>>> when i get a chance.
>>>
>>>> If I'm wrong about the agreement, I would like to start another
>>>> discussion,
>>>> and my initial position is that they should only match in the ASCII
>>>> range.
>>> Agreed.
>> Just to be precise about it, I neglected to mention that my statement was
>> meant only to apply in the absence of a "use locale", and whatever the base
>> C library routines do on an EBCDIC system.  I wasn't advocating changing the
>> behavior under those circumstances.
>>
> 
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About