Front page | perl.perl5.porters |
Postings from December 2017
Re: Implementing script runs
December 20, 2017 10:56
Re: Implementing script runs
Message ID: CANgJU+WFAaHw-y77Mam3BVCdLd6LuCeqc2tqOxOUKd1WvUt+Fg@mail.gmail.com
On 18 Dec 2017 22:26, "Karl Williamson" <firstname.lastname@example.org> wrote:
On 11/05/2017 12:19 PM, Zefram wrote:
> Father Chrysostomos wrote:
> I like this syntax. I wonder how it would work with the "-" for turning
> modifiers off.
> However, as we discussed last year, this is semantically wrong
> for script runs. The modifiers that we have so far affect the
> interpretation of each part of the affected subpattern individually,
> such that /(?foo:bar)(?foo:baz)/ is always equivalent to /(?foo:barbaz)/.
> This holds even in the /i cases that mess with character boundaries,
> such as "\xdf" =~ /(?i:s)(?i:s)/. The script run feature is completely
> unlike these: it's about the string matching the subpattern *as a whole*,
> and the concatenation of two script-run subpatterns does not behave like
> a single script-run subpattern.
> So I think a different syntax is required for script runs. We already
> have the "(*WORD)" syntax to identify extended regexp features by keyword,
> so I think "(*script_run:...)" is a good way to go.
I have implemented it as (*WORD: ...)
but I think there is a better syntax. The docs say this syntax is for
backtracking verbs like PRUNE, and the existing implementation is based on
that assumption. I had to make an exception for this unrelated purpose.
I can argue this different ways.
First is we still have (+...) available for new things.
Second while I can see some logic to keeping (*...) for "meta" directives
like verbs, the fact is it's a huge space to reserve for what probably will
be a fairly sparse set of functionality. Also note that other regex engines
may have appropriated some of these terms as well and we would be wise
So even tho I wrote the docs you mention I am open to either introducing
(+...) or to adding non-verb extensions to the (*...) namespace. I have a
tiny preference for the former as it would mean simpler rules to learn. Eg:
star means does not match but changes how the pattern behaves, whereas plus
means matches something. If we do this we might also introduce a convention
of lower case names for plus as opposed to upper case for star to keep them
visually distinctive and reduce "shouting" one the pattern.
I agree that a modifier is not the correct way to go, but we have other
This feature is really a zero-length assertion around the enclosed
pattern. In action is is most like the possessive or atomic construct
And so, it could be specified using a syntax like this. Of the few
available characters that could be used instead of '>', I like these the
But another option is to do
I would prefer:
since we already have things like (?P>name)
I like this because I think it could be used to get more meaningful names
for the other zero-length assertions. I don't use them often enough to
remember them, and always have to look them up in the docs. Each time, I
say, yeah, that makes sense, but I still can't remember them. At some
point we could say
I like this idea and especially if we go this route I prefer (+...) to me
it is a much more elegant style and it does not necessitate all the
My guess is that script runs will usually be combined with the atomic
construct, so we could have
to be a shortcut for the combination.
I'd put the atomic first, no need for Picard speak. :-)