develooper Front page | perl.perl5.porters | Postings from December 2017

Re: Implementing script runs

Thread Previous | Thread Next
From:
Karl Williamson
Date:
December 18, 2017 21:26
Subject:
Re: Implementing script runs
Message ID:
790dffb9-4a51-0f75-e51e-7aea3d02a994@khwilliamson.com
On 11/05/2017 12:19 PM, Zefram wrote:
> Father Chrysostomos wrote:
>>     (?+extended_modiifer_1,extended_modifier_2:)
>>     (?mix+script_run:...)
> 
> I like this syntax.  I wonder how it would work with the "-" for turning
> modifiers off.
> 
> However, as we discussed last year, this is semantically wrong
> for script runs.  The modifiers that we have so far affect the
> interpretation of each part of the affected subpattern individually,
> such that /(?foo:bar)(?foo:baz)/ is always equivalent to /(?foo:barbaz)/.
> This holds even in the /i cases that mess with character boundaries,
> such as "\xdf" =~ /(?i:s)(?i:s)/.  The script run feature is completely
> unlike these: it's about the string matching the subpattern *as a whole*,
> and the concatenation of two script-run subpatterns does not behave like
> a single script-run subpattern.
> 
> So I think a different syntax is required for script runs.  We already
> have the "(*WORD)" syntax to identify extended regexp features by keyword,
> so I think "(*script_run:...)" is a good way to go.
> 
> -zefram
> 


I have implemented it as (*WORD: ...)
but I think there is a better syntax.  The docs say this syntax is for 
backtracking verbs like PRUNE, and the existing implementation is based 
on that assumption.  I had to make an exception for this unrelated purpose.

I agree that a modifier is not the correct way to go, but we have other 
alternatives.

This feature is really a zero-length assertion around the enclosed 
pattern.  In action is is most like the possessive or atomic construct

(?>pattern)

And so, it could be specified using a syntax like this.  Of the few 
available characters that could be used instead of '>', I like these the 
best:

(?,pattern)
(?.pattern)
(?~pattern)

But another option is to do

(?SCRIPT_RUN>pattern)
(?SR>pattern)

since we already have things like (?P>name)

I like this because I think it could be used to get more meaningful 
names for the other zero-length assertions.  I don't use them often 
enough to remember them, and always have to look them up in the docs. 
Each time, I say, yeah, that makes sense, but I still can't remember 
them.  At some point we could say

(?ZERO_LENGTH_LOOK_BEHIND>pattern)
(?ZLBA>pattern)
(?ATOMIC>pattern)

My guess is that script runs will usually be combined with the atomic 
construct, so we could have

(?SCRIPT_RUN_ATOMIC>pattern)
(?SRA>pattern)

to be a shortcut for the combination.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About