develooper Front page | perl.perl5.porters | Postings from October 2017

Implementing script runs

Thread Next
From:
Karl Williamson
Date:
October 18, 2017 17:33
Subject:
Implementing script runs
Message ID:
8bb9f919-b154-0c23-49a3-6f2af86a7489@khwilliamson.com
Here is my updated proposal for this, fleshing out what was agreed to at 
the core hackathon, with other things that have been discussed over the 
years.  See http://nntp.perl.org/group/perl.perl5.porters/220508 and its 
thread for background.

I propose to have

(?{script_run}:...)

mean to match the subpattern indicated by "...", but impose the 
additional constraint that all characters matched must be in the same 
Unicode script as the first character in the matched sequence is. 
Certain characters like ':" and "." and combining accents would be 
considered to be in every script.

This prevents mixed-script attacks like the famous one of a link containing

paypal.com

where the characters before the 'l' aren't Latin, but Cyrillic, and 
clicking on that link would lead to a malicious page.

(?{script_run}:\d)+

would match any sequence of decimal digits, but all would have to come 
from the same script.

Abigail has the use where he wants to match any number between 0 and 
255, and only those, but in any script, and all from the same script. 
That's doable in this proposal; something like:

qr/ ( \b

     (?{script_run}:[ \p(nv=0} - \p{nv=9} ])
    |(?{script_run}:[ \p(nv=1} - \p{nv=9} ] [ \p(nv=0} - \p{nv=9} ])
    |(?{script_run}:[ \p(nv=1} - \p{nv=2} ] [ \p(nv=0} - \p{nv=9} ]{1,2})

     ) \b /xx;

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About