On Thu, Sep 02, 2010 at 12:58:16PM -0700, Mitchell N Charity wrote: > quotemeta() fails to quote a CENT SIGN when, > using utf8, the string is created with > a literal CENT SIGN character, instead of with \xA2 . This appears to be down to a difference in behaviour of quotemeta depending on whether the string is internally UTF-8 encoded or not. For non-utf8 strings, all chars *except* isALNUM() are \\-escaped; in particular, chars with ords in the range 128-255 are always quoted. For utf8 strings, chars with ord > 127 are never quoted. I think this this is a bug that needs fixing, but can anyone confirm or deny? In particular this would be be significant change in behaviour, since currently the miriad of codepoints above 255 are not escaped, including "letters" from non-latin character ranges. I would assume that all these should be quoted. The current docs make it clear that all chars except [A-Za-z_0-9] should be escaped. -- Monto Blanco... scorchio!Thread Previous | Thread Next