develooper Front page | perl.perl5.porters | Postings from October 2009

Re: Rule 1 has been invoked [Re: What should \s \w \d match in 5.12?]

Thread Previous | Thread Next
From:
Eric Brine
Date:
October 28, 2009 13:40
Subject:
Re: Rule 1 has been invoked [Re: What should \s \w \d match in 5.12?]
Message ID:
f86994700910281340g60c3e98bm8595e994698230a9@mail.gmail.com
On Wed, Oct 28, 2009 at 3:42 PM, John <john.imrie@vodafoneemail.co.uk>wrote:

Now I don't see alphanumeric defined anywhere but I also don't see how it
> can be forced to match 灞
>

It already does

$ perl -v
This is perl, v5.8.8 built for i486-linux-gnu-thread-multi
...

$ perl -le'print chr(28766) =~ /^\w\z/ || 0'
1

Further more taint washing is carried out by regexes and extending the
> samantics of \w \d and \s could allow tainted data to be cleaned where it
> should not.
>

If you're using \w to filter out chinese characters, you're already failing.

What do you think extending \w \d and \s will do.
>

There's been no discussion of expanding them. The problem is that what they
match varies depending on Perl internals

$ perl -le'
    $s1 = "\xC2";
    $s2 = "\x{2660}";
    for ($s1, $s2, $s1.$s2) {
        print /\w/ || 0;
    }
'
0
0
1

If there's no \w in s1 or in s2, why does their concatenation have one.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About