develooper Front page | perl.perl5.porters | Postings from October 2009

Re: What should \s \w \d match in 5.12?

Thread Previous | Thread Next
From:
demerphq
Date:
October 6, 2009 00:31
Subject:
Re: What should \s \w \d match in 5.12?
Message ID:
9b18b3110910060031vbd081dfn6cc8124c7148bbc0@mail.gmail.com
2009/10/6 karl williamson <public@khwilliamson.com>:
> In reading these comments all at once, I'm not sure we are all on the same
> page as to the proposal, and what happens now.  So, let me state what I
> think both are; correct me if I'm wrong:
>
> The way it works now:
>
> With a 'use locale' or on an EBCDIC platform:
> they match whatever the C language ctype routines say they match: isdigit()
> for \d, isspace() for \s, and isalnum() for \w (but I know \w adds
> underscore but I didn't see where it was doing that in a quick scan of the
> code).
>
> Absent a 'use locale' and not on an EBCDIC platform:
>
> If (the string being matched against doesn't have the utf8 flag on.
> && the regular expression doesn't contain something that would make it
>                   look like it should behave in utf8 semantics.  Any \p{} in
> it,                  for example, will force it into utf8)
> {
>        \d = [0-9]; \w = [_a-zA-Z\d]; \s = [ \t\f\r\n]
>
> } else {
>
>        they match what Unicode says, except that there are some bugs so
>    that \w matches too much, like fractions.
>
> }
>
>
>
>
>
> What I meant to say was the proposal:
> No change to 'use locale' or EBCDIC.  Even if we could deprecate 'use
> locale', we would be stuck with supporting it in 5.12, I think.
>
> Otherwise, \d = [0-9]; \w = [_a-zA-Z\d]; \s = [ \t\f\r\n]
> regardless.

With the caveat that I am replying prior to finishing my first cup of
coffee of the day I think this looks right.

Yves




-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About