make regen is required. Also, this patches a .t in Test::Simple; I'm cc'ing the cpan maintainer. The attached series of commits fix the inconsistent handling of Latin1 characters in matching \s, \w, and hence \b (boundary matching) and their complements. This solves the second of the 5 areas of the "Unicode Bug". (The first, lc(), ucfirst(), ... was fixed for 5.12. Those remaining are matching POSIX character classes, matching /i, and user-defined case mappings.) These commits also add regex modifiers /u (unicode), /l (locale), and /t (traditional). /a is not part of this patch. I have made up the term "Matching mode" to describe this. I'm open to a better term, if you can think of one. Much of this patch was submitted and withdrawn last year. It has a somewhat cleaner implementation than that one, in that no new regnodes were added. Instead, it turns out that the flags field in the affected regnodes was unused. By using that, we fly under the radar of some other code, which as a result didn't have to change. Note that there is a behavior change that may be incompatible with existing code. Previously, if a regex is compiled from within 'use locale', and then interpolated into another regex outside it, the localeness of the interpolated part is lost. And vice versa. This patch causes the regex to remember how it was compiled, so it stays with it even when interpolated. Also, the stringification of a regex will show its matching mode modifier, e.g., 't', so code that looks at that will have to change. Several of the .t changes are because of this, and because the minimum length of this changed. For example, (?t-xism:...) with this patch, instead of (?-xism:...) before it. I'm working on the pod changes, and will submit them later.Thread Next