develooper Front page | perl.perl5.porters | Postings from December 2011

Re: The Unicode Bug still bites?

Thread Previous
From:
Karl Williamson
Date:
December 24, 2011 20:00
Subject:
Re: The Unicode Bug still bites?
Message ID:
4EF69F96.20707@khwilliamson.com
On 12/23/2011 10:03 PM, Brian Fraser wrote:
>
>
> On Sat, Dec 24, 2011 at 12:37 AM, Karl Williamson
> <public@khwilliamson.com <mailto:public@khwilliamson.com>> wrote:
>
>     On 04/13/2011 08:33 PM, brian d foy wrote:
>
>         In article<3020.1302738770@chthon__>, Tom Christiansen
>         <tchrist@perl.com <mailto:tchrist@perl.com>>  wrote:
>
>
>             What's wrong with quotemeta() someday changing to quote say,
>             \p{Pattern_Syntax}
>             character if and when Perl makes use of them?
>
>
>         Why not do that now (as in 5.15) with the current characters that it
>         would quote. Then, it's already there when we want to expand it.
>
>
>     This seems like a reasonable solution.  Does anyone have objections?
>
>
> I do, sorta. \p{Pattern_Syntax} isn't a superset of what quotemeta()
> currently uses; things like NULL and the control characters aren't
> included in the former, but are quoted by the latter.
> Seems easy enough to resolve though: quotemeta should escape all
> non-word character in the ASCII range, or those matching \p{Pattern_Syntax}.
>

Whatever solution we adopt should not change ASCII-range characters.

But since portions of the original thread have been cut-off, I want to 
re-iterate that this is a form of the Unicode bug.  quotemeta behaves 
differently depending on whether the string is encoded in UTF-8 or not. 
  This has caused problems in the field.  And I think it should be fixed 
to be consistent whatever the encoding, in 5.16.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About