There is a problem with how the regexp engine handles certain types of escapes and strings of different encodings. For instance: perl -wle'$x=qq(\x{DF}); $x=~/$x|\x{100}/ and print qq(ok)' produces the following: Malformed UTF-8 character (unexpected non-continuation byte 0x7c, immediately after start byte 0xdf) in regexp compilation at -e line 1. As far as I can tell in blead this is because when the \x{100} is parsed during the sizing phase it switches the pattern is utf8 flag to true, but doesnt upgrade the string to utf8. On the second pass it tries to read the string as utf8 and fails. The attached patch fixes this so that when it notices this might happen it upgrades the string to utf8 and then redoes[2] the sizing phase since the recoding might have altered the required allocation. This could have caused a buffer overrun error.[1] D:\dev\perl\ver\zoro\t\win32>..\perl -wle"$x=qq(\x{DF}); $x=~/$x|\x{100}/ and print 'ok'" ok \x{DF} is ß by the way. Pesky thing. As a bonus this patch includes two bug fixes which I came across while working out the utf8 encoding problem. One is for the trie code charclass logic which was doing the wrong thing under utf8 and the other was in some debugging output code that was using the wrong utf8 flag. Not bad for number of bugs per single test case really. :-) Yves [1] I almost wonder if this could have been responsible for the sizing bug in the xml code from a while back.. Ill have to try reverting that patch with this patch applied and see. [2] This is far from the most efficient way to deal with this. It would be nice to fail-fast the parse somehow so that the least work possible is done in the first parse pass following the time we know we have to upgrade the string . This could be far into the parse recursion stack so its a bit difficult to do. -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Next