On 04/13/2011 01:02 PM, Tom Christiansen wrote: > I started off trying to fix this paragraph: > > In C<quotemeta> or its inline equivalent C<\Q>, all characters whose > code points are above 127 are not quoted in UTF-8 encoded strings, but > all are quoted in UTF-8 strings. > > That (still) makes no sense to me. Here's the wording I came up with that > reflects what I *thought* it was trying to say: > > In C<quotemeta> or its inline equivalent C<\Q>, no characters > code points above 127 are quoted in UTF-8 encoded strings, but in > byte encoded strings, code points between 128-255 are always quoted. > > Except that that is not true. :( I've played with blead, including > compiled afresh this morning, and on both Darwin and Linux, and I still > can't figure out what is supposed to happen, because it doesn't match > either of those paragraphs above. I think from looking at Devel::Peek > that things aren't being properly utf8'd. This should not be happening > according to what I think that that should be saying: > > % blead -CS -M-feature=unicode_strings -le '$a = "\x{e9}"; print quotemeta($a)' > \é > % blead -CS -Mfeature=unicode_strings -le '$a = "\x{e9}"; print quotemeta($a)' > \é > > This happens on both Darwin and Mac, and I don't understand why with -E or unicode_strings > that I have a non-Unicode String! > > % blead -CS -MDevel::Peek -E '$a = "\x{e9}"; say "\Q$a"' > \é > % blead -CS -MDevel::Peek -E '$a = "\x{e9}"; Dump "\Q$a"' > SV = PV(0x8010d8) at 0x80ed20 > REFCNT = 1 > FLAGS = (PADTMP,POK,pPOK) > PV = 0x203dc0 "\\\351"\0 > CUR = 2 > LEN = 16 > > % blead -CS -MDevel::Peek -Mfeature=unicode_strings -le '$a = "\x{e9}"; Dump($a)' > SV = PV(0x801038) at 0x80ed60 > REFCNT = 1 > FLAGS = (POK,pPOK) > PV = 0x201380 "\351"\0 > CUR = 1 > LEN = 16 > % blead -CS -MDevel::Peek -Mfeature=unicode_strings -le '$a = "\x{e9}"; Dump("\Q$a")' > SV = PV(0x8010e8) at 0x80ed30 > REFCNT = 1 > FLAGS = (PADTMP,POK,pPOK) > PV = 0x203e00 "\\\351"\0 > CUR = 2 > LEN = 16 > > But look! > > % blead -CS -MDevel::Peek -E '$a = "\x{e9}"; utf8::upgrade($a) ; say "\Q$a"' > é > % blead -CS -MDevel::Peek -E '$a = "\x{e9}"; utf8::upgrade($a) ; Dump "\Q$a"' > SV = PV(0x8010d8) at 0x80f040 > REFCNT = 1 > FLAGS = (PADTMP,POK,pPOK,UTF8) > PV = 0x203df0 "\303\251"\0 [UTF8 "\x{e9}"] > CUR = 2 > LEN = 16 > > I thought the whole point was so I didn't have to *do* that anymore. :( > > --tom > The reason this didn't get fixed in 5.14 is because DaveM and I were waiting for you to get back to us on the more general problem of what Larry would like for quotemeta'ing Unicode alphanumerics. And that didn't happen until it was too late. And that's why it is listed as part of the Unicode bug. What I think should happen for 5.16 is to just not quote anything above 127, under any conditions.Thread Previous | Thread Next