develooper Front page | perl.perl5.porters | Postings from December 2008

PATCH Fix malformed utf8 in regexec.c

Thread Next
From:
karl williamson
Date:
December 26, 2008 11:00
Subject:
PATCH Fix malformed utf8 in regexec.c
Message ID:
495529AA.4010700@khwilliamson.com
Attached is a patch for this.  The problem is that in this subroutine p 
may  or may not be in utf8, and the flag do_utf8 indicates which.  The 
code calls  various functions passing both p and do_utf8, and these 
work.  But to_utf8_fold() expects its argument to always be in utf8, and 
this caused the problem  Also the av's are stored as utf8, so the memEQ 
would not work correctly on a non-utf8 p even though no error message 
would be generated.

The patch creates a copy of p in utf8, if necessary, and uses that even 
when calling the functions that accept the do_utf8 flag, as they create 
temporaries, convert to utf8, and then throw the conversion away.  It is 
more efficient to do the conversion once in the caller and pass that to 
each routine.

I'm not sure what to do about a test case.

"\xc0" =~ qr/[\x{1f4}\xc0]/;

doesn't show the problem, but

use Test::More tests => 1;
like("\xc0", qr/[\x{1f4}\xc0]/i, 'get malformed utf8');

does.  And it looks like none of the existing re tests use Test.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About