On Thu, Sep 2, 2021, at 9:20 AM, demerphq wrote: > No. The flag does not mean "upgraded" it means "unicode semantics, utf8 encoding". Upgrading is one way to get such a string, and it might even be the most common, but the most important and likely to be correct way is explicit decoding. You wrote a whole lot, but this quote is, I think, a the center of what I have found confusing. The utf8 flag on a scalar doesn't mean Unicode semantics. That way lies The Unicode Bug. Under the unicode_strings feature, recommended and in the version bundle since v5.12 (2010), all strings have unicode semantics and are treated as a sequence of codepoints when performing textish operations. perl -E 'say "word" if "\xFF" =~ /\w/' This string hasn't been upgraded, hasn't been decoded, and prior to unicode_string, would not have matched. My take here is that unicode_strings is a *bugfix* (fixing the "Unicode Bug"), and it sounds like you are implying that it is not, and that the correct behavior to learn is that the utf8 flag on a scalar is the *correct* way to know whether Unicode semantics would be applied. This surprises me. -- rjbsThread Previous | Thread Next