I believe this resolves other bug reports, but haven't had time to look them up. The patch is both attached, and available at: git://github.com/khwilliamson/perl.git branch: matching This patch makes case-sensitive regex matching give the same results regardless of whether the string and/or pattern are in utf8, unless "use legacy 'unicode8bit'" is in effect, in which case it works as before. Since Yves is incommunicado, I took what he had done before Larry's veto and extended and modified it, adding an intermediate way. What that means is that anything that looks like[[:xxx:]] will match only in the ASCII range, or in the current locale, if set. I never heard any controversy about that part of the proposal, and it makes sense to me that a Posix construct should act like the Posix definition says to. \d, \s, and \w (hence \b) and their complements act as before, except that when 8-bit unicode mode is on, they also match appropriately in the 128-255 range. This solves the utf8ness problem, as the Posix never match outside their locale or ascii, so utf8ness doesn't matter; and the others match the same whether utf8 or not. I was surprised at actually how little code was involved. Making Posix always mean Posix simplified things quite a bit. \d doesn't match anything in the 128-255 range, so it did not have to be touched. Essentially, all that had to be done was to create new regnodes for \s, \w, and \b (and complements) that say to match using unicode semantics. Everywhere their parallel nodes are in the code, I added these nodes. When compiling, regcomp checks for being in 8-bit unicode semantics mode, and if so, uses the new node; if not it uses the old node. In execution, regexec uses the old definition when matching the old node, and the new semantics when the match is for the new node. I split [[:word:]] from \w and [[:digit:]] from \d so that they would match using Posix semantics regardless of utf8ness. But that is basically it. Several .t files depended on the legacy behaviors to test edge cases for utf8ness. I added a 'use legacy' to those. Also, several text processing modules can't deal with \s matching a no-break space. I spent too much time trying to learn them to decide if this is a bug or not, finding the one or two lines in each that were at fault. It is a bug if the text can be utf8, which would automatically cause the \s to suddenly match the no-break space. But I wasn't sure which ones are claimed to transparently handle utf8. So, I added a 'use legacy' to the modules, which gives the same behavior as in the past. Several TODOs were accomplished and removed from some regex .t files I took advantage of changing regcomp.c to add a croak when the re has gone insane; I've had it in my development version for some time. It seems to happen when there are too many /\N{...}/ calls in a program.Thread Next