* Karl Williamson <public@khwilliamson.com> [2013-08-26T23:00:52] > A strict interpretation fails this because Unicode has never said that a > non-Unicode code point should be considered unassigned. But I now believe it > is more DWIM to consider them so. Agreed. > Perl could change to make the fall-back value be what happens for non-Unicode > code points. This, I believe, is more DWIM. Agreed. > The reason I didn't do this, besides wanting to be very strict > Unicode, is that there is a complication. Consider the Perl > extension \p{Unassigned}, which is the same as \p{gc=Unassigned}. > Currently these match 864_348 code points. If we changed the > decision I made, these would now match billions of code points. Is that a problem? I guess the problem is that someone's program, previously, was doing this: my $str = "Sentinel point follows: \x{xFF_FFFF}"; if ($str =~ /p{Unassigned}/) {...} ...and the branch will now be entered when it was not before? If that is the only problem, I'd like to hear from the folks who've talked about using trans-Unicode codepoints and whether they think this is going to cause actual problems. My gut feeling is that we should feel free to change this unless the new semantics seem *wrong*, which they don't to me. After all, the docs mark this behavior as still in flux: The result is undefined if you try to match a non-Unicode code point (that is, one above 0x10FFFF) against a Unicode property. Currently, a warning is raised, and the match will fail. In some cases, this is counterintuitive, as both these fail: [...] -- rjbsThread Previous | Thread Next