karl williamson wrote: > demerphq wrote: >> 2008/12/27 karl williamson <public@khwilliamson.com>: >>> Rafael Garcia-Suarez wrote: >>>> 2008/12/26 karl williamson <public@khwilliamson.com>: >>>>> Attached is a patch for this. The problem is that in this >>>>> subroutine p >>>>> may >>>>> or may not be in utf8, and the flag do_utf8 indicates which. The >>>>> code >>>>> calls various functions passing both p and do_utf8, and these >>>>> work. But >>>>> to_utf8_fold() expects its argument to always be in utf8, and this >>>>> caused >>>>> the problem Also the av's are stored as utf8, so the memEQ would not >>>>> work >>>>> correctly on a non-utf8 p even though no error message would be >>>>> generated. >>>>> >>>>> The patch creates a copy of p in utf8, if necessary, and uses that >>>>> even >>>>> when >>>>> calling the functions that accept the do_utf8 flag, as they create >>>>> temporaries, convert to utf8, and then throw the conversion away. >>>>> It is >>>>> more efficient to do the conversion once in the caller and pass >>>>> that to >>>>> each >>>>> routine. >>>>> >>>>> I'm not sure what to do about a test case. >>>>> >>>>> "\xc0" =~ qr/[\x{1f4}\xc0]/; >>>>> >>>>> doesn't show the problem, but >>>>> >>>>> use Test::More tests => 1; >>>>> like("\xc0", qr/[\x{1f4}\xc0]/i, 'get malformed utf8'); >>>>> >>>>> does. And it looks like none of the existing re tests use Test. >>>> Then there is probably a problem in Test::More itself ? >>>> >>>> (Is there a bug number for this?) >>>> >>>> I've tested the patch, but I would feel more comfortable with a test >>>> case. (or with a comment from Yves) >>>> >>>> >>> No bug number. Should I create one? >>> >>> I suspect that it isn't a bug in Test::More, but that it calls things >>> somehow differently, which is kind of scary in itself that it >>> perturbs the >>> environment Maybe a certain class of tests shouldn't be done using >>> Test. I >>> don't know. >>> >>> If we don't hear from Yves in the meantime, I'll look tomorrow to see >>> how to >>> reproduce it without using Test. >> >> Is this a problem with casefolding unicode characters in a charclass? >> >> I have to admit that on reading this I dont have much to add. And my >> windows box is offline these days due to a hardware failure so if im >> going to debug it ill have to learn gdb finally. Which could take a >> while :-) >> >> Yves >> >> >> >> > This is turning into several threads. I'll separate out the casefolding > charclass into a separate one. > > Duh! The reason I didn't get a malformed message without Test::More is > because I forgot to turn on warnings. > > Attached is another patch, to add a test case. > Here is a slightly revised test case patch, since ff is always an illegal utf8 byte, I changed to use it.Thread Previous | Thread Next