develooper Front page | perl.perl5.porters | Postings from August 2013

Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"

Thread Previous | Thread Next
From:
Eric Brine
Date:
August 29, 2013 17:47
Subject:
Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"
Message ID:
CALJW-qG3cg=MgU7qvDmrhQ6BmU28VWxnBBmKdqBRmwEL_V+Ucw@mail.gmail.com
On Mon, Aug 26, 2013 at 11:00 PM, Karl Williamson
<public@khwilliamson.com>wrote:

> The other option was to make \p{gc=unassigned} succeed for non-Unicode
> code points.  But this isn't what Unicode says.  A strict interpretation
> fails this because Unicode has never said that a non-Unicode code point
> should be considered unassigned.  But I now believe it is more DWIM to
> consider them so.
>

If you're worried that someone might want to distinguish
Unicode-but-unassigned from non-Unicode, then you could extend gc to
include gc=NonUnicode. However, I suspect suspect such a distinction is
rarely needed, so it's probably better to include non-Unicode code points
in gc=unassigned, and let those who want to distinguish unassigned code
points from non-Unicode code points use (?[ [\p{gc=unassigned] -
[\x{0}-\x{10FFFF}] ]) and (?[ [\p{gc=unassigned] - [^\x{0}-\x{10FFFF}] ]).
(Maybe provide \p{Unicode}?)

In general, it's clear that non-Unicode code points should behave as a
Unicode code point without the property. No more /\p{XXX}/ && /\P{XXX}/
being true.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About