develooper Front page | perl.perl5.porters | Postings from September 2014

Re: RFC: implementing script runs

Thread Previous | Thread Next
From:
demerphq
Date:
September 25, 2014 09:21
Subject:
Re: RFC: implementing script runs
Message ID:
CANgJU+UPj5dfGPT7WMi5V6-L6L1oO26pkdv+2HDD52FF=SW0sQ@mail.gmail.com
On 25 September 2014 07:17, Karl Williamson <public@khwilliamson.com> wrote:

> Unicode defines a "script run" to be contiguous characters from the same
> script, like all Latin or all Greek.
>
> These can be important for security.  See
> http://www.unicode.org/reports/tr36/
>
> It seems to me that Perl should offer an easy way to specify that a regex
> pattern element should match only a script run.  I'm proposing the only
> current illegal syntax that is easy to type that I'm aware of; other
> suggestions welcome.
>
> The idea I had is to have an extra '*' following the quantifier mean to
> use a script run.  For example, qr/\w+*/ would match all the consecutive
> word characters that are in the same script as the first one found.
>
>
I like the general idea of being able to do this, but using a quantifier
modifier to enable it doesn't seem a good fit. Do you have any other
proposals for implementing it? Some kind of new character class syntax
maybe? A pseudo POSIX style character class maybe?

Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About