develooper Front page | perl.perl5.porters | Postings from July 2016

Re: RFC: seeking syntax for allowing script run pattern matching

Thread Previous
From:
Karl Williamson
Date:
July 6, 2016 22:59
Subject:
Re: RFC: seeking syntax for allowing script run pattern matching
Message ID:
577D8D1A.8080004@khwilliamson.com
On 07/06/2016 02:02 PM, hv@crypt.org wrote:
> Karl Williamson <public@khwilliamson.com> wrote:
> :A script run is a sequence of characters, all from the same script, such
> :as Latin or Greek. [...]
> :I'm looking for some more ideas.
>
> It feels like something that should apply over a scope in a pattern, with
> affordance for applying it to a whole pattern - we have exactly that
> concept with the flags and /(?f:...)/ construct.

This sounds reasonable to me.
>
> That implies it should be possible to say, using //S as a placeholder name,
> something like m{\w+ \w+}S to ask for two words separated by a space, with
> all the letters coming from a single script.
>
> That also implies it can be locally disabled:
>    /(?S:\w+ (?-S:\w) \w+)/
> ===
>    my $letter = qr{\w};
>    /\w+ $letter \w+/S;
>
> We occasionally see bugs caused by misunderstanding of how flags act on
> interpolated patterns, but consistency with other existing behaviours
> seems desirable for all that.
>
> That leaves interesting questions of how the following should behave:
>    /(?S:\w (?S:\w) \w)/
> and
>    /(?S:\w (?-S:. (?S:\w+) .) \w)/
>
> I think the first (where +S is introduced when it is already active)
> should be a noop - the same script should still be required.

I think I agree

>
> I think the second (where +S is introduced in a -S scope, itself within
> a +S scope) should permit a new script.

I agree

>
> I wonder whether the next request will be a variant that overrides /./
> to be equivalent to /(?S:\W|\w)/.

I expected that /./ should also be subject to this, basically any 
construct that could be multiple scripts, with some exceptions to be 
determined later.  Obviously \w is the most likely.
>
> Should the proposal also affect uses of \w inside character classes?

I think so.

Another question is should things in \p{scx=common} automatically be 
allowed in every script run?  If no, should there be an option (another 
flag) to do so.  I've attached a complete list of the scx=common 
characters in Unicode 9.0, sorted by general category.


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About