develooper Front page | perl.perl5.porters | Postings from December 2017

Re: Implementing script runs

Thread Previous | Thread Next
From:
Karl Williamson
Date:
December 20, 2017 19:44
Subject:
Re: Implementing script runs
Message ID:
e4b7944e-f62b-7c88-4925-4ad463ad1f51@khwilliamson.com
On 12/20/2017 03:55 AM, demerphq wrote:
> On 18 Dec 2017 22:26, "Karl Williamson" <public@khwilliamson.com 
> <mailto:public@khwilliamson.com>> wrote:
> 
>     On 11/05/2017 12:19 PM, Zefram wrote:
> 
>         Father Chrysostomos wrote:
> 
>                  (?+extended_modiifer_1,extended_modifier_2:)
>                  (?mix+script_run:...)
> 
> 
>         I like this syntax.  I wonder how it would work with the "-" for
>         turning
>         modifiers off.
> 
>         However, as we discussed last year, this is semantically wrong
>         for script runs.  The modifiers that we have so far affect the
>         interpretation of each part of the affected subpattern individually,
>         such that /(?foo:bar)(?foo:baz)/ is always equivalent to
>         /(?foo:barbaz)/.
>         This holds even in the /i cases that mess with character boundaries,
>         such as "\xdf" =~ /(?i:s)(?i:s)/.  The script run feature is
>         completely
>         unlike these: it's about the string matching the subpattern *as
>         a whole*,
>         and the concatenation of two script-run subpatterns does not
>         behave like
>         a single script-run subpattern.
> 
>         So I think a different syntax is required for script runs.  We
>         already
>         have the "(*WORD)" syntax to identify extended regexp features
>         by keyword,
>         so I think "(*script_run:...)" is a good way to go.
> 
>         -zefram
> 
> 
> 
>     I have implemented it as (*WORD: ...)
>     but I think there is a better syntax.  The docs say this syntax is
>     for backtracking verbs like PRUNE, and the existing implementation
>     is based on that assumption.  I had to make an exception for this
>     unrelated purpose.
> 
> 
> I can argue this different ways.
> 
> First is we still have (+...) available for new things.

But, at the top of this post, I quoted from earlier that we were 
thinking of using the plus for extended regex modifiers, which *might* 
preclude using it for this.

[I don't comment on anything below, am retaining it for thread continuity]
> 
> Second while I can see some logic to keeping (*...) for "meta" 
> directives like verbs, the fact is it's a huge space to reserve for what 
> probably will be a fairly sparse set of functionality. Also note that 
> other regex engines may have appropriated some of these terms as well 
> and we would be wise avoiding collisions.
> 
> 
> So even tho I wrote the docs you mention I am open to either introducing 
> (+...) or to adding non-verb extensions to the (*...) namespace. I have 
> a tiny preference for the former as it would mean simpler rules to 
> learn. Eg: star means does not match but changes how the pattern 
> behaves, whereas plus means matches something. If we do this we might 
> also introduce a convention of lower case names for plus as opposed to 
> upper case for star to keep them visually distinctive and reduce 
> "shouting" one the pattern.
> 
> 
> 
>     I agree that a modifier is not the correct way to go, but we have
>     other alternatives.
> 
>     This feature is really a zero-length assertion around the enclosed
>     pattern.  In action is is most like the possessive or atomic construct
> 
>     (?>pattern)
> 
>     And so, it could be specified using a syntax like this.  Of the few
>     available characters that could be used instead of '>', I like these
>     the best:
> 
>     (?,pattern)
>     (?.pattern)
>     (?~pattern)
> 
>     But another option is to do
> 
>     (?SCRIPT_RUN>pattern)
> 
> 
> I would prefer:
> 
> (+script_run:pattern)
> 
> 
>     (?SR>pattern)
> 
>     since we already have things like (?P>name)
> 
>     I like this because I think it could be used to get more meaningful
>     names for the other zero-length assertions.  I don't use them often
>     enough to remember them, and always have to look them up in the
>     docs. Each time, I say, yeah, that makes sense, but I still can't
>     remember them.  At some point we could say
> 
>     (?ZERO_LENGTH_LOOK_BEHIND>pattern)
>     (?ZLBA>pattern)
>     (?ATOMIC>pattern)
> 
> 
> (+zero_length_look_behind:pattern)
> 
> I like this idea and especially if we go this route I prefer (+...) to 
> me it is a much more elegant style and it does not necessitate all the 
> shouting.
> 
> 
> 
>     My guess is that script runs will usually be combined with the
>     atomic construct, so we could have
> 
>     (?SCRIPT_RUN_ATOMIC>pattern)
>     (?SRA>pattern)
> 
>     to be a shortcut for the combination.
> 
> 
> I'd put the atomic first, no need for Picard speak.  :-)
> 
> Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About