develooper Front page | perl.perl5.porters | Postings from December 2008

Re: PATCH Fix malformed utf8 in regexec.c only shows with Test::More

Thread Previous | Thread Next
From:
karl williamson
Date:
December 28, 2008 06:22
Subject:
Re: PATCH Fix malformed utf8 in regexec.c only shows with Test::More
Message ID:
49578B7E.3040005@khwilliamson.com
karl williamson wrote:
> demerphq wrote:
>> 2008/12/27 karl williamson <public@khwilliamson.com>:
>>> Rafael Garcia-Suarez wrote:
>>>> 2008/12/26 karl williamson <public@khwilliamson.com>:
>>>>> Attached is a patch for this.  The problem is that in this 
>>>>> subroutine p
>>>>> may
>>>>>  or may not be in utf8, and the flag do_utf8 indicates which.  The 
>>>>> code
>>>>> calls  various functions passing both p and do_utf8, and these 
>>>>> work.  But
>>>>> to_utf8_fold() expects its argument to always be in utf8, and this 
>>>>> caused
>>>>> the problem  Also the av's are stored as utf8, so the memEQ would not
>>>>> work
>>>>> correctly on a non-utf8 p even though no error message would be
>>>>> generated.
>>>>>
>>>>> The patch creates a copy of p in utf8, if necessary, and uses that 
>>>>> even
>>>>> when
>>>>> calling the functions that accept the do_utf8 flag, as they create
>>>>> temporaries, convert to utf8, and then throw the conversion away.  
>>>>> It is
>>>>> more efficient to do the conversion once in the caller and pass 
>>>>> that to
>>>>> each
>>>>> routine.
>>>>>
>>>>> I'm not sure what to do about a test case.
>>>>>
>>>>> "\xc0" =~ qr/[\x{1f4}\xc0]/;
>>>>>
>>>>> doesn't show the problem, but
>>>>>
>>>>> use Test::More tests => 1;
>>>>> like("\xc0", qr/[\x{1f4}\xc0]/i, 'get malformed utf8');
>>>>>
>>>>> does.  And it looks like none of the existing re tests use Test.
>>>> Then there is probably a problem in Test::More itself ?
>>>>
>>>> (Is there a bug number for this?)
>>>>
>>>> I've tested the patch, but I would feel more comfortable with a test
>>>> case. (or with a comment from Yves)
>>>>
>>>>
>>> No bug number.  Should I create one?
>>>
>>> I suspect that it isn't a bug in Test::More, but that it calls things
>>> somehow differently, which is kind of scary in itself that it 
>>> perturbs the
>>> environment  Maybe a certain class of tests shouldn't be done using 
>>> Test.  I
>>> don't know.
>>>
>>> If we don't hear from Yves in the meantime, I'll look tomorrow to see 
>>> how to
>>> reproduce it without using Test.
>>
>> Is this a problem with casefolding unicode characters in a charclass?
>>
>> I have to admit that on reading this I dont have much to add. And my
>> windows box is offline these days due to a hardware failure so if im
>> going to debug it ill have to learn gdb finally. Which could take a
>> while :-)
>>
>> Yves
>>
>>
>>
>>
> This is turning into several threads.  I'll separate out the casefolding 
> charclass into a separate one.
> 
> Duh! The reason I didn't get a malformed message without Test::More is 
> because I forgot to turn on warnings.
> 
> Attached is another patch, to add a test case.
> 
Here is a slightly revised test case patch, since ff is always an 
illegal utf8 byte, I changed to use it.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About