develooper Front page | perl.perl5.porters | Postings from December 2010

Re: [perl #77654] quotemeta() fails to quote literal non-wordcharacter under utf8

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
December 16, 2010 09:56
Subject:
Re: [perl #77654] quotemeta() fails to quote literal non-wordcharacter under utf8
Message ID:
20101216175631.GH10901@iabyn.com
On Thu, Sep 02, 2010 at 12:58:16PM -0700, Mitchell N Charity wrote:
> quotemeta() fails to quote a CENT SIGN when,
> using utf8, the string is created with
> a literal CENT SIGN character, instead of with \xA2 .

This appears to be down to a difference in behaviour of quotemeta
depending on whether the string is internally UTF-8 encoded or not.

For non-utf8 strings, all chars *except* isALNUM() are \\-escaped; in
particular, chars with ords in the range 128-255 are always quoted.

For utf8 strings, chars with ord > 127 are never quoted. I think this
this is a bug that needs fixing, but can anyone confirm or deny?
In particular this would be be significant change in behaviour, since
currently the miriad of codepoints above 255 are not escaped, including
"letters" from non-latin character ranges. I would assume that all these
should be quoted.

The current docs make it clear that all chars except [A-Za-z_0-9] should
be escaped.

-- 
Monto Blanco... scorchio!

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About