develooper Front page | perl.perl5.porters | Postings from December 2017

Re: Implementing script runs

Thread Previous | Thread Next
From:
demerphq
Date:
December 20, 2017 21:38
Subject:
Re: Implementing script runs
Message ID:
CANgJU+V6YcU3VDt1_iKmSuf7A58ZfSdyNQbH=xvF0NJqh3sRAA@mail.gmail.com
On 20 Dec 2017 20:44, "Karl Williamson" <public@khwilliamson.com> wrote:

On 12/20/2017 03:55 AM, demerphq wrote:

On 18 Dec 2017 22:26, "Karl Williamson" <public@khwilliamson.com <mailto:
> public@khwilliamson.com>> wrote:
>
>     On 11/05/2017 12:19 PM, Zefram wrote:
>
>         Father Chrysostomos wrote:
>
>                  (?+extended_modiifer_1,extended_modifier_2:)
>                  (?mix+script_run:...)
>
>
>         I like this syntax.  I wonder how it would work with the "-" for
>         turning
>         modifiers off.
>
>         However, as we discussed last year, this is semantically wrong
>         for script runs.  The modifiers that we have so far affect the
>         interpretation of each part of the affected subpattern
> individually,
>         such that /(?foo:bar)(?foo:baz)/ is always equivalent to
>         /(?foo:barbaz)/.
>         This holds even in the /i cases that mess with character
> boundaries,
>         such as "\xdf" =~ /(?i:s)(?i:s)/.  The script run feature is
>         completely
>         unlike these: it's about the string matching the subpattern *as
>         a whole*,
>         and the concatenation of two script-run subpatterns does not
>         behave like
>         a single script-run subpattern.
>
>         So I think a different syntax is required for script runs.  We
>         already
>         have the "(*WORD)" syntax to identify extended regexp features
>         by keyword,
>         so I think "(*script_run:...)" is a good way to go.
>
>         -zefram
>
>
>
>     I have implemented it as (*WORD: ...)
>     but I think there is a better syntax.  The docs say this syntax is
>     for backtracking verbs like PRUNE, and the existing implementation
>     is based on that assumption.  I had to make an exception for this
>     unrelated purpose.
>
>
> I can argue this different ways.
>
> First is we still have (+...) available for new things.
>

But, at the top of this post, I quoted from earlier that we were thinking
of using the plus for extended regex modifiers, which *might* preclude
using it for this.

[I don't comment on anything below, am retaining it for thread continuity]


To me that is backwards. Extended regex modifiers fit the verb form just
fine.

We should use (+ for something better.

Yves




> Second while I can see some logic to keeping (*...) for "meta" directives
> like verbs, the fact is it's a huge space to reserve for what probably will
> be a fairly sparse set of functionality. Also note that other regex engines
> may have appropriated some of these terms as well and we would be wise
> avoiding collisions.
>
>
> So even tho I wrote the docs you mention I am open to either introducing
> (+...) or to adding non-verb extensions to the (*...) namespace. I have a
> tiny preference for the former as it would mean simpler rules to learn. Eg:
> star means does not match but changes how the pattern behaves, whereas plus
> means matches something. If we do this we might also introduce a convention
> of lower case names for plus as opposed to upper case for star to keep them
> visually distinctive and reduce "shouting" one the pattern.
>
>
>
>     I agree that a modifier is not the correct way to go, but we have
>     other alternatives.
>
>     This feature is really a zero-length assertion around the enclosed
>     pattern.  In action is is most like the possessive or atomic construct
>
>     (?>pattern)
>
>     And so, it could be specified using a syntax like this.  Of the few
>     available characters that could be used instead of '>', I like these
>     the best:
>
>     (?,pattern)
>     (?.pattern)
>     (?~pattern)
>
>     But another option is to do
>
>     (?SCRIPT_RUN>pattern)
>
>
> I would prefer:
>
> (+script_run:pattern)
>
>
>     (?SR>pattern)
>
>     since we already have things like (?P>name)
>
>     I like this because I think it could be used to get more meaningful
>     names for the other zero-length assertions.  I don't use them often
>     enough to remember them, and always have to look them up in the
>     docs. Each time, I say, yeah, that makes sense, but I still can't
>     remember them.  At some point we could say
>
>     (?ZERO_LENGTH_LOOK_BEHIND>pattern)
>     (?ZLBA>pattern)
>     (?ATOMIC>pattern)
>
>
> (+zero_length_look_behind:pattern)
>
> I like this idea and especially if we go this route I prefer (+...) to me
> it is a much more elegant style and it does not necessitate all the
> shouting.
>
>
>
>     My guess is that script runs will usually be combined with the
>     atomic construct, so we could have
>
>     (?SCRIPT_RUN_ATOMIC>pattern)
>     (?SRA>pattern)
>
>     to be a shortcut for the combination.
>
>
> I'd put the atomic first, no need for Picard speak.  :-)
>
> Yves
>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About