develooper Front page | perl.perl5.porters | Postings from December 2010

Re: [perl #77654] quotemeta() fails to quote literal non-wordcharacter under utf8

Thread Previous | Thread Next
Dave Mitchell
December 16, 2010 09:56
Re: [perl #77654] quotemeta() fails to quote literal non-wordcharacter under utf8
Message ID:
On Thu, Sep 02, 2010 at 12:58:16PM -0700, Mitchell N Charity wrote:
> quotemeta() fails to quote a CENT SIGN when,
> using utf8, the string is created with
> a literal CENT SIGN character, instead of with \xA2 .

This appears to be down to a difference in behaviour of quotemeta
depending on whether the string is internally UTF-8 encoded or not.

For non-utf8 strings, all chars *except* isALNUM() are \\-escaped; in
particular, chars with ords in the range 128-255 are always quoted.

For utf8 strings, chars with ord > 127 are never quoted. I think this
this is a bug that needs fixing, but can anyone confirm or deny?
In particular this would be be significant change in behaviour, since
currently the miriad of codepoints above 255 are not escaped, including
"letters" from non-latin character ranges. I would assume that all these
should be quoted.

The current docs make it clear that all chars except [A-Za-z_0-9] should
be escaped.

Monto Blanco... scorchio!

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About