develooper Front page | perl.perl5.porters | Postings from July 2016

RFC: seeking syntax for allowing script run pattern matching

Thread Next
From:
Karl Williamson
Date:
July 6, 2016 19:45
Subject:
RFC: seeking syntax for allowing script run pattern matching
Message ID:
577D5FCB.9010105@khwilliamson.com
A script run is a sequence of characters, all from the same script, such 
as Latin or Greek.  In applications that need to care about security, 
they are important so that someone can't, say substitute a look-alike 
cyrillic letter for a latin one,  'scope' looks pretty much identical in 
Macedonian Cyrillic as it does in English Latin.  'paypal' is the most 
famous case, as all but the 'l' have look-alikes in cyrillic and latin.

It would be good to allow a program to specify that they want a script 
run so as to automatically avoid such security holes.

It only makes sense if the match can be multiple input characters. 
Therefore, I thought modifying the quantifier to indicate a run would be 
good:

\w+*
\d{5,8}*

are currently a syntax errors, and so using '*' after the modifier would 
be a candidate.  But Lukas pointed out that many times a quantifier 
could never mean a script run.  What would

  (ABC)+*

mean?  If the '*' is ignored in such a case, should it warn?

We could use something like {sr} (standing for "script run" ) instead of *.

\w+{sr}

But this has the same problem.  Or we could have it be this:

\w{sr}+

But then \w{sr} without a quantifier doesn't mean anything.  Or Andrew 
Rodland suggested

(*sr:\w+)

I'm looking for some more ideas.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About