develooper Front page | perl.perl5.porters | Postings from December 2017

Re: Implementing script runs

Thread Previous | Thread Next
From:
Karl Williamson
Date:
December 25, 2017 00:40
Subject:
Re: Implementing script runs
Message ID:
b9e4d13c-54b3-5331-05fb-03d4c63c1484@khwilliamson.com
On 12/20/2017 02:37 PM, demerphq wrote:
> 
> 
> On 20 Dec 2017 20:44, "Karl Williamson" <public@khwilliamson.com 
> <mailto:public@khwilliamson.com>> wrote:
> 
>     On 12/20/2017 03:55 AM, demerphq wrote:
> 
>         On 18 Dec 2017 22:26, "Karl Williamson" <public@khwilliamson.com
>         <mailto:public@khwilliamson.com> <mailto:public@khwilliamson.com
>         <mailto:public@khwilliamson.com>>> wrote:
> 
>              On 11/05/2017 12:19 PM, Zefram wrote:
> 
>                  Father Chrysostomos wrote:
> 
>                           (?+extended_modiifer_1,extended_modifier_2:)
>                           (?mix+script_run:...)
> 
> 
>                  I like this syntax.  I wonder how it would work with
>         the "-" for
>                  turning
>                  modifiers off.
> 
>                  However, as we discussed last year, this is
>         semantically wrong
>                  for script runs.  The modifiers that we have so far
>         affect the
>                  interpretation of each part of the affected subpattern
>         individually,
>                  such that /(?foo:bar)(?foo:baz)/ is always equivalent to
>                  /(?foo:barbaz)/.
>                  This holds even in the /i cases that mess with
>         character boundaries,
>                  such as "\xdf" =~ /(?i:s)(?i:s)/.  The script run
>         feature is
>                  completely
>                  unlike these: it's about the string matching the
>         subpattern *as
>                  a whole*,
>                  and the concatenation of two script-run subpatterns
>         does not
>                  behave like
>                  a single script-run subpattern.
> 
>                  So I think a different syntax is required for script
>         runs.  We
>                  already
>                  have the "(*WORD)" syntax to identify extended regexp
>         features
>                  by keyword,
>                  so I think "(*script_run:...)" is a good way to go.
> 
>                  -zefram
> 
> 
> 
>              I have implemented it as (*WORD: ...)
>              but I think there is a better syntax.  The docs say this
>         syntax is
>              for backtracking verbs like PRUNE, and the existing
>         implementation
>              is based on that assumption.  I had to make an exception
>         for this
>              unrelated purpose.
> 
> 
>         I can argue this different ways.
> 
>         First is we still have (+...) available for new things.
> 
> 
>     But, at the top of this post, I quoted from earlier that we were
>     thinking of using the plus for extended regex modifiers, which
>     *might* preclude using it for this.
> 
>     [I don't comment on anything below, am retaining it for thread
>     continuity]
> 
> 
> To me that is backwards. Extended regex modifiers fit the verb form just 
> fine.
> 
> We should use (+ for something better.
> 
> Yves
> 
> 
> 
> 
>         Second while I can see some logic to keeping (*...) for "meta"
>         directives like verbs, the fact is it's a huge space to reserve
>         for what probably will be a fairly sparse set of functionality.
>         Also note that other regex engines may have appropriated some of
>         these terms as well and we would be wise avoiding collisions.
> 
> 
>         So even tho I wrote the docs you mention I am open to either
>         introducing (+...) or to adding non-verb extensions to the
>         (*...) namespace. I have a tiny preference for the former as it
>         would mean simpler rules to learn. Eg: star means does not match
>         but changes how the pattern behaves, whereas plus means matches
>         something. If we do this we might also introduce a convention of
>         lower case names for plus as opposed to upper case for star to
>         keep them visually distinctive and reduce "shouting" one the
>         pattern.
> 
> 
> 
>              I agree that a modifier is not the correct way to go, but
>         we have
>              other alternatives.
> 
>              This feature is really a zero-length assertion around the
>         enclosed
>              pattern.  In action is is most like the possessive or
>         atomic construct
> 
>              (?>pattern)
> 
>              And so, it could be specified using a syntax like this.  Of
>         the few
>              available characters that could be used instead of '>', I
>         like these
>              the best:
> 
>              (?,pattern)
>              (?.pattern)
>              (?~pattern)
> 
>              But another option is to do
> 
>              (?SCRIPT_RUN>pattern)
> 
> 
>         I would prefer:
> 
>         (+script_run:pattern)
> 
> 
>              (?SR>pattern)
> 
>              since we already have things like (?P>name)
> 
>              I like this because I think it could be used to get more
>         meaningful
>              names for the other zero-length assertions.  I don't use
>         them often
>              enough to remember them, and always have to look them up in the
>              docs. Each time, I say, yeah, that makes sense, but I still
>         can't
>              remember them.  At some point we could say
> 
>              (?ZERO_LENGTH_LOOK_BEHIND>pattern)
>              (?ZLBA>pattern)
>              (?ATOMIC>pattern)
> 
> 
>         (+zero_length_look_behind:pattern)
> 
>         I like this idea and especially if we go this route I prefer
>         (+...) to me it is a much more elegant style and it does not
>         necessitate all the shouting.
> 
> 
> 
>              My guess is that script runs will usually be combined with the
>              atomic construct, so we could have
> 
>              (?SCRIPT_RUN_ATOMIC>pattern)
>              (?SRA>pattern)
> 
>              to be a shortcut for the combination.
> 
> 
>         I'd put the atomic first, no need for Picard speak.  :-)
> 
>         Yves
> 
> 
> 

I have merged the branch for implementing script runs as 
(+script_run:pattern)
in commit 2e205a1bf5f45e557712a89654675d92a91b6678

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About