develooper Front page | perl.perl5.porters | Postings from November 2017

Re: Implementing script runs

Thread Previous | Thread Next
Karl Williamson
November 5, 2017 17:36
Re: Implementing script runs
Message ID:
On 10/18/2017 06:26 PM, Tony Cook wrote:
> On Wed, Oct 18, 2017 at 11:32:50AM -0600, Karl Williamson wrote:
>> Here is my updated proposal for this, fleshing out what was agreed to at the
>> core hackathon, with other things that have been discussed over the years.
>> See  and its thread for
>> background.
>> I propose to have
>> (?{script_run}:...)
>> mean to match the subpattern indicated by "...", but impose the additional
>> constraint that all characters matched must be in the same Unicode script as
>> the first character in the matched sequence is. Certain characters like ':"
>> and "." and combining accents would be considered to be in every script.
>> This prevents mixed-script attacks like the famous one of a link containing
>> where the characters before the 'l' aren't Latin, but Cyrillic, and clicking
>> on that link would lead to a malicious page.
>> (?{script_run}:\d)+
>> would match any sequence of decimal digits, but all would have to come from
>> the same script.
> This looks interesting and useful.

I have been looking into this further.  The syntax

(?{script_run}: ...)

makes {script_run} effectively an extended regex modifier.  There have 
been various proposals for extended modifiers.  This matches one of 
them.  It would be best if the syntax we come up with for this feature 
could be used in the future for extended modifiers in general.

In looking at this with the intent to implement it, I realized that 
there is somewhat of a conflict.  The problem is that


is already used to signify "..." is code to execute.  In this case, the 
colon could be used to disambiguate.  But modifiers can also be used thusly:


as a way of turning on a modifier.  The script_run modifier cannot be 
used this way because of its nature, but it would be best if the syntax 
for specifying it would be the same as some potential future extended 
modifier that could be used this wasy,  such as


which would be like (?i) but would exclude multi-character folds.

But if we say


that's potentially confusable with the (?{...}) construct to run code. 
Now the number of potential extended modifiers is quite finite, and we 
could say that those aren't going to be code to run, but it would be 
easier, and probably clearer to have something that means "we have 
extended modifiers here"

In thinking about it, I came up with

(?+{extended_modifier_1}{extended_modifier_2}: ... )

with the plus intuitively meaning "more".

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About