2009/10/6 karl williamson <public@khwilliamson.com>:
> In reading these comments all at once, I'm not sure we are all on the same
> page as to the proposal, and what happens now. So, let me state what I
> think both are; correct me if I'm wrong:
>
> The way it works now:
>
> With a 'use locale' or on an EBCDIC platform:
> they match whatever the C language ctype routines say they match: isdigit()
> for \d, isspace() for \s, and isalnum() for \w (but I know \w adds
> underscore but I didn't see where it was doing that in a quick scan of the
> code).
>
> Absent a 'use locale' and not on an EBCDIC platform:
>
> If (the string being matched against doesn't have the utf8 flag on.
> && the regular expression doesn't contain something that would make it
> look like it should behave in utf8 semantics. Any \p{} in
> it, for example, will force it into utf8)
> {
> \d = [0-9]; \w = [_a-zA-Z\d]; \s = [ \t\f\r\n]
>
> } else {
>
> they match what Unicode says, except that there are some bugs so
> that \w matches too much, like fractions.
>
> }
>
>
>
>
>
> What I meant to say was the proposal:
> No change to 'use locale' or EBCDIC. Even if we could deprecate 'use
> locale', we would be stuck with supporting it in 5.12, I think.
>
> Otherwise, \d = [0-9]; \w = [_a-zA-Z\d]; \s = [ \t\f\r\n]
> regardless.
With the caveat that I am replying prior to finishing my first cup of
coffee of the day I think this looks right.
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Previous
|
Thread Next