develooper Front page | perl.perl5.porters | Postings from December 2008

Re: PATCH Fix malformed utf8 in regexec.c only shows with Test::More

Thread Previous | Thread Next
From:
demerphq
Date:
December 27, 2008 02:36
Subject:
Re: PATCH Fix malformed utf8 in regexec.c only shows with Test::More
Message ID:
9b18b3110812270236x8f811cdg9693ccd9e8b706da@mail.gmail.com
2008/12/27 karl williamson <public@khwilliamson.com>:
> Rafael Garcia-Suarez wrote:
>>
>> 2008/12/26 karl williamson <public@khwilliamson.com>:
>>>
>>> Attached is a patch for this.  The problem is that in this subroutine p
>>> may
>>>  or may not be in utf8, and the flag do_utf8 indicates which.  The code
>>> calls  various functions passing both p and do_utf8, and these work.  But
>>> to_utf8_fold() expects its argument to always be in utf8, and this caused
>>> the problem  Also the av's are stored as utf8, so the memEQ would not
>>> work
>>> correctly on a non-utf8 p even though no error message would be
>>> generated.
>>>
>>> The patch creates a copy of p in utf8, if necessary, and uses that even
>>> when
>>> calling the functions that accept the do_utf8 flag, as they create
>>> temporaries, convert to utf8, and then throw the conversion away.  It is
>>> more efficient to do the conversion once in the caller and pass that to
>>> each
>>> routine.
>>>
>>> I'm not sure what to do about a test case.
>>>
>>> "\xc0" =~ qr/[\x{1f4}\xc0]/;
>>>
>>> doesn't show the problem, but
>>>
>>> use Test::More tests => 1;
>>> like("\xc0", qr/[\x{1f4}\xc0]/i, 'get malformed utf8');
>>>
>>> does.  And it looks like none of the existing re tests use Test.
>>
>> Then there is probably a problem in Test::More itself ?
>>
>> (Is there a bug number for this?)
>>
>> I've tested the patch, but I would feel more comfortable with a test
>> case. (or with a comment from Yves)
>>
>>
> No bug number.  Should I create one?
>
> I suspect that it isn't a bug in Test::More, but that it calls things
> somehow differently, which is kind of scary in itself that it perturbs the
> environment  Maybe a certain class of tests shouldn't be done using Test.  I
> don't know.
>
> If we don't hear from Yves in the meantime, I'll look tomorrow to see how to
> reproduce it without using Test.

Is this a problem with casefolding unicode characters in a charclass?

I have to admit that on reading this I dont have much to add. And my
windows box is offline these days due to a hardware failure so if im
going to debug it ill have to learn gdb finally. Which could take a
while :-)

Yves




-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About