develooper Front page | perl.perl5.porters | Postings from October 2017

RFC: \w{Latin|Greek}

From:
Karl Williamson
Date:
October 18, 2017 17:00
Subject:
RFC: \w{Latin|Greek}
Message ID:
41f7de36-9e39-a60e-15ac-baf0cb43efd4@khwilliamson.com
Unescaped left brace is available to use in 5.30 (after a long 
deprecation cycle) when it appears after a "\ :alpha:" sequence.  This 
allows the actual implementation of one of the earliest proposals for 
this capability, and is described in this email.

\w{Latin|Greek} would match only those \w characters that are in the 
Latin or Greek scripts.  It is currently already possible to do this, 
but more clunkily:

  (?[ \w & ( \p{Latin} | \p{Greek} ])

In principal, what's in the braces need not be just a script name.  It 
could be any Unicode binary property

   \d{nv=2}

would choose the decimal digits that are equivalent to '2'.  These 
include the familiar ASCII one, but also Bengali, Thai, ....  Saying 
just \p{nv=2} doesn't do the same thing, as it would include things like 
a superscript 2 that aren't real digits.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About