develooper Front page | perl.perl5.porters | Postings from February 2019

Re: RFC: Adding \p{foo=/re/}

Thread Previous | Thread Next
From:
demerphq
Date:
February 9, 2019 06:50
Subject:
Re: RFC: Adding \p{foo=/re/}
Message ID:
CANgJU+V92TNBU9ho84s5TAMQHR7zdoSixB1uQw4PBE8Ecuk-pw@mail.gmail.com
On Sat, 9 Feb 2019, 13:26 Deven T. Corzine, <deven@ties.org> wrote:

> On Fri, Feb 8, 2019 at 11:56 PM demerphq <demerphq@gmail.com> wrote:
>
>> Yes I do have concerns. I replied in detail in another email, but to
>> summarize succinctly, there are many features in the regex engine, how
>> does this new proposal interact with them? How do we ensure that using
>> this feature does not result in quadratic performance when an
>> equivalent pattern using a different feature set would be linear?
>>
>
> I saw your other email, but I think this is something different which
> shouldn't be like named recursion.
>
> Quote from the UTS 18 link: "this feature allows the use of a regular
> expression to pick out a set of characters based on whether the property
> values match the regular expression."
>
> If I understand correctly, any regex used in this mechanism would match
> against property values of the Unicode character set, NOT against arbitrary
> text.  Since the Unicode data is static, I see no reason why the property
> regex shouldn't be compiled independently AND executed immediately, while
> compiling the containing regex.  The results should then function as a
> fixed predefined character class of Unicode characters, much like a POSIX
> character class but specified in a more dynamic and flexible manner.  The
> containing regex should be able to include this property-based character
> class inside a normal character class.  Since the property regex can be
> executed at compile time, there is no risk of making regular expressions
> turn quadratic, nor should there be interactions from captures or anything
> else.
>
> For example, from UTS 18 again, the property value \p{toNfd=/b/} could be
> compiled into [\x{0062}\x{1e03}\x{1e05}\x{1e07}], with the same exact
> runtime semantics and performance characteristics, and the property
> value \p{name=/^LATIN LETTER.*P$/} could be similarly compiled into
> [\x{01aa}\x{0294}\x{0296}\x{1d18}], etc.
>
> If these property regular expressions were compiled and executed at
> compile time like this, and turned into straightforward Unicode character
> classes to use at runtime, wouldn't that avoid the concerns you mentioned
> in the other email?
>

Answering very quickly (I am on holiday) I will say that if what you are
saying is correct that this is a way to define a character class and that
it results in a first order compiled character class then I have no
objections other than the syntax being very misleading in form. *But* that
doesn't seem to match what Karl said in terms of implementation which looks
much closer to an eval/recursion group.

Yves

>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About