I am waiting for blead to reopen before I submit a patch for extending feature unicode_strings to matching \s, and \w. In documenting it, it occurred to me that the existing implementation is wrong, along with the existing interaction between use bytes and use locale. To refresh your memory, "use feature 'unicode_strings'" is supposed to mean that even non-utf8 data is to be considered to have Unicode semantics. In practice, this only affects the characters from 128-255. It is new in 5.12, and is implemented there only on functions that change case, such as ucfirst(), and s/.../\L...\E/. The implementation is that it is always subservient to 'use bytes' and 'use locale'. Thus, in use bytes; code 1 { use feature "unicode_strings" code 2 } code 3 byte semantics is used throughout. It seems to me that instead unicode semantics should be used in code 2; byte semantics in codes 1 and 3. Similarly with locale; unicode_strings is subservient to that, and I think they should behave instead as I just suggested for bytes. And, I think that bytes and locale should similarly override each other, which they don't currently. I believe that the implementation of hint_bits must currently stack it, so that when an interior block is popped, the value for the outer block is automatically restored. Any comments?Thread Next