develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
From:
Ricardo Signes
Date:
September 3, 2021 15:52
Subject:
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
3535d0b2-0dcb-41d8-ac24-978599266d04@beta.fastmail.com
On Thu, Sep 2, 2021, at 9:20 AM, demerphq wrote:
> No. The flag does not mean "upgraded" it means "unicode semantics, utf8 encoding". Upgrading is one way to get such a string, and it might even be the most common, but the most important and likely to be correct way is explicit decoding.

You wrote a whole lot, but this quote is, I think, a the center of what I have found confusing.

The utf8 flag on a scalar doesn't mean Unicode semantics.  That way lies The Unicode Bug.  Under the unicode_strings feature, recommended and in the version bundle since v5.12 (2010), all strings have unicode semantics and are treated as a sequence of codepoints when performing textish operations.

perl -E 'say "word" if "\xFF" =~ /\w/'

This string hasn't been upgraded, hasn't been decoded, and prior to unicode_string, would not have matched.

My take here is that unicode_strings is a *bugfix* (fixing the "Unicode Bug"), and it sounds like you are implying that it is not, and that the correct behavior to learn is that the utf8 flag on a scalar is the *correct* way to know whether Unicode semantics would be applied.  This surprises me.

-- 
rjbs
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About