develooper Front page | perl.perl5.porters | Postings from October 2017

Re: Implementing script runs

Thread Previous | Thread Next
Tony Cook
October 19, 2017 00:26
Re: Implementing script runs
Message ID:
On Wed, Oct 18, 2017 at 11:32:50AM -0600, Karl Williamson wrote:
> Here is my updated proposal for this, fleshing out what was agreed to at the
> core hackathon, with other things that have been discussed over the years.
> See and its thread for
> background.
> I propose to have
> (?{script_run}:...)
> mean to match the subpattern indicated by "...", but impose the additional
> constraint that all characters matched must be in the same Unicode script as
> the first character in the matched sequence is. Certain characters like ':"
> and "." and combining accents would be considered to be in every script.
> This prevents mixed-script attacks like the famous one of a link containing
> where the characters before the 'l' aren't Latin, but Cyrillic, and clicking
> on that link would lead to a malicious page.
> (?{script_run}:\d)+
> would match any sequence of decimal digits, but all would have to come from
> the same script.

This looks interesting and useful.

> Abigail has the use where he wants to match any number between 0 and 255,
> and only those, but in any script, and all from the same script. That's
> doable in this proposal; something like:
> qr/ ( \b
>     (?{script_run}:[ \p(nv=0} - \p{nv=9} ])
>    |(?{script_run}:[ \p(nv=1} - \p{nv=9} ] [ \p(nv=0} - \p{nv=9} ])
>    |(?{script_run}:[ \p(nv=1} - \p{nv=2} ] [ \p(nv=0} - \p{nv=9} ]{1,2})
>     ) \b /xx;

I do wonder how these ranges would be implemented.

Perhaps something like:


would be simpler to implement and more concise.

Your ranges already have a meaning, though not an especially useful or
understandable one:

 $ ./perl -Ilib -le 'print "1" =~ /[\p{nv=0}-\p{nv=9}]/ ? "match" : "no match"'
 no match


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About