develooper Front page | perl.perl5.porters | Postings from August 2013

Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"

Thread Previous | Thread Next
From:
Brian Fraser
Date:
August 29, 2013 16:18
Subject:
Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"
Message ID:
CA+nL+nZED0ckBdGEWBaMpdLrfKXJiy2VJBZ+k+NkZ8o=Z8KO9g@mail.gmail.com
On Thu, Aug 29, 2013 at 10:40 AM, Ricardo Signes
<perl.p5p@rjbs.manxome.org>wrote:

> * Karl Williamson <public@khwilliamson.com> [2013-08-26T23:00:52]
> > A strict interpretation fails this because Unicode has never said that a
> > non-Unicode code point should be considered unassigned.  But I now
> believe it
> > is more DWIM to consider them so.
>
> Agreed.
>
> > Perl could change to make the fall-back value be what happens for
> non-Unicode
> > code points.  This, I believe, is more DWIM.
>
> Agreed.
>
> > The reason I didn't do this, besides wanting to be very strict
> > Unicode, is that there is a complication.  Consider the Perl
> > extension \p{Unassigned}, which is the same as \p{gc=Unassigned}.
> > Currently these match 864_348 code points.  If we changed the
> > decision I made, these would now match billions of code points.
>
> Is that a problem?
>
> I guess the problem is that someone's program, previously, was doing this:
>
>   my $str = "Sentinel point follows: \x{xFF_FFFF}";
>   if ($str =~ /p{Unassigned}/) {...}
>
> ...and the branch will now be entered when it was not before?
>
> If that is the only problem, I'd like to hear from the folks who've talked
> about using trans-Unicode codepoints and whether they think this is going
> to
> cause actual problems.  My gut feeling is that we should feel free to
> change
> this unless the new semantics seem *wrong*, which they don't to me.  After
> all,
> the docs mark this behavior as still in flux:
>
>
I've used trans-Unicode codepoints before*, and was bitten by them not
matching \p{Unassigned}; I understood the reasoning for having them not
match, but I wish that they did.

* I needed "a character that will never show up in database"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About