Front page | perl.perl5.porters |
Postings from October 2017
Implementing script runs
Thread Next
From:
Karl Williamson
Date:
October 18, 2017 17:33
Subject:
Implementing script runs
Message ID:
8bb9f919-b154-0c23-49a3-6f2af86a7489@khwilliamson.com
Here is my updated proposal for this, fleshing out what was agreed to at
the core hackathon, with other things that have been discussed over the
years. See http://nntp.perl.org/group/perl.perl5.porters/220508 and its
thread for background.
I propose to have
(?{script_run}:...)
mean to match the subpattern indicated by "...", but impose the
additional constraint that all characters matched must be in the same
Unicode script as the first character in the matched sequence is.
Certain characters like ':" and "." and combining accents would be
considered to be in every script.
This prevents mixed-script attacks like the famous one of a link containing
paypal.com
where the characters before the 'l' aren't Latin, but Cyrillic, and
clicking on that link would lead to a malicious page.
(?{script_run}:\d)+
would match any sequence of decimal digits, but all would have to come
from the same script.
Abigail has the use where he wants to match any number between 0 and
255, and only those, but in any script, and all from the same script.
That's doable in this proposal; something like:
qr/ ( \b
(?{script_run}:[ \p(nv=0} - \p{nv=9} ])
|(?{script_run}:[ \p(nv=1} - \p{nv=9} ] [ \p(nv=0} - \p{nv=9} ])
|(?{script_run}:[ \p(nv=1} - \p{nv=2} ] [ \p(nv=0} - \p{nv=9} ]{1,2})
) \b /xx;
Thread Next
-
Implementing script runs
by Karl Williamson