develooper Front page | perl.perl5.porters | Postings from August 2013

Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"

Thread Previous | Thread Next
From:
Ricardo Signes
Date:
August 29, 2013 13:40
Subject:
Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"
Message ID:
20130829134012.GB14478@cancer.codesimply.com
* Karl Williamson <public@khwilliamson.com> [2013-08-26T23:00:52]
> A strict interpretation fails this because Unicode has never said that a
> non-Unicode code point should be considered unassigned.  But I now believe it
> is more DWIM to consider them so.

Agreed.

> Perl could change to make the fall-back value be what happens for non-Unicode
> code points.  This, I believe, is more DWIM.

Agreed.

> The reason I didn't do this, besides wanting to be very strict
> Unicode, is that there is a complication.  Consider the Perl
> extension \p{Unassigned}, which is the same as \p{gc=Unassigned}.
> Currently these match 864_348 code points.  If we changed the
> decision I made, these would now match billions of code points.

Is that a problem?

I guess the problem is that someone's program, previously, was doing this:

  my $str = "Sentinel point follows: \x{xFF_FFFF}";
  if ($str =~ /p{Unassigned}/) {...}

...and the branch will now be entered when it was not before?

If that is the only problem, I'd like to hear from the folks who've talked
about using trans-Unicode codepoints and whether they think this is going to
cause actual problems.  My gut feeling is that we should feel free to change
this unless the new semantics seem *wrong*, which they don't to me.  After all,
the docs mark this behavior as still in flux:

   The result is undefined if you try to match a non-Unicode code point
   (that is, one above 0x10FFFF) against a Unicode property.  Currently, a
   warning is raised, and the match will fail.  In some cases, this is
   counterintuitive, as both these fail: [...]

-- 
rjbs

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About