develooper Front page | perl.perl5.porters | Postings from February 2019

Re: RFC: Adding \p{foo=/re/}

Thread Previous | Thread Next
February 12, 2019 06:03
Re: RFC: Adding \p{foo=/re/}
Message ID:
On Sun, 10 Feb 2019, 01:01 Karl Williamson, <> wrote:

> On 2/9/19 2:29 AM, Deven T. Corzine wrote:
> > On Sat, Feb 9, 2019 at 4:16 AM Deven T. Corzine <
> > <>> wrote:
> >
> >     Karl, can you enlighten us?  Are you recursing into a subpattern at
> >     runtime? What do you think of the hypothetical approach I described?
> >
> >
> > I just read Karl’s description again: “The way it's implemented is a
> > separate regex is compiled and executed
> > during the compilation of the outer one.”
> >
> > I didn’t notice the “and executed” part the first time.  That sounds
> > exactly like the hypothetical implementation that I described,
> actually...
> >
> > Deven
> >
> I'm sorry for not being clear.  Deven is correct that his hypothetical
> implementation is what I have done.
> This is a bolt-on feature to the Perl's regexes.  It implements a
> portion of the wildcard feature of what UTS 18 asks for, using their
> syntax.  It is an apparent goal, as long listed in perlunicode, to do as
> much of UTS 18 as we can.
> And the implementation isn't efficient.
> It is implemented by, during the compilation of a character class,
> interrupting that compilation, assembling an inner pattern, then
> compiling that and executing it to find all the code points it matches.
> That list is then added to whatever else is in the character class, the
> inner pattern's space is freed, and compilation of the outer pattern
> resumed.  There is no recursive execution.  But there is recursion in
> the sense, as I described, that a second pattern is compiled while in
> the middle of compiling an outer pattern.  I don't know if that is an
> issue or not.  The patterns do not share anything, no groups, etc.

Please consider my objections withdrawn, sorry for the misunderstanding and
thank you for explaining.


> I've learned that a feature like this should be marked as experimental,
> so that it can be refined or even removed, and marking it as such lowers
> expectations as to its well-thought-outness and bug-free-ness.  It
> allows us to try things out and get feedback without having to say we
> think it is fully done.  The prototype is so marked.
> I've also learned that inefficiencies in compilation don't really
> matter.  I removed an entire pass of the regex compilation process, with
> extra mallocs being the price.  There did not seem to be a noticeable
> change in the speed of execution of our test suite!  This inefficient
> implementation (and I don't know another way to do it) won't be
> noticeable in the end, because it's only done at compilation.
> I believe PCRE doesn't do this; I don't know about other engines.  But
> if no one does, I would think that us having a feature no one else does
> is a selling point.  If others do, we could perhaps learn from their
> syntax.  A quick google search didn't turn up anything obvious.
> If there are issues with various constructs, we can forbid those.  My
> implementation, for example, doesn't allow braces in the subpattern, and
> hence no construct that requires braces.  I think that's a reasonable
> initial restriction to make it easier to implement something, that
> otherwise wouldn't get implemented.
> If the UTS 18 syntax is misleading, what isn't?

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About