develooper Front page | perl.perl5.porters | Postings from October 2017

Implementing script runs

Thread Next
Karl Williamson
October 18, 2017 17:33
Implementing script runs
Message ID:
Here is my updated proposal for this, fleshing out what was agreed to at 
the core hackathon, with other things that have been discussed over the 
years.  See and its 
thread for background.

I propose to have


mean to match the subpattern indicated by "...", but impose the 
additional constraint that all characters matched must be in the same 
Unicode script as the first character in the matched sequence is. 
Certain characters like ':" and "." and combining accents would be 
considered to be in every script.

This prevents mixed-script attacks like the famous one of a link containing

where the characters before the 'l' aren't Latin, but Cyrillic, and 
clicking on that link would lead to a malicious page.


would match any sequence of decimal digits, but all would have to come 
from the same script.

Abigail has the use where he wants to match any number between 0 and 
255, and only those, but in any script, and all from the same script. 
That's doable in this proposal; something like:

qr/ ( \b

     (?{script_run}:[ \p(nv=0} - \p{nv=9} ])
    |(?{script_run}:[ \p(nv=1} - \p{nv=9} ] [ \p(nv=0} - \p{nv=9} ])
    |(?{script_run}:[ \p(nv=1} - \p{nv=2} ] [ \p(nv=0} - \p{nv=9} ]{1,2})

     ) \b /xx;

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About