develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
Dan Book
September 2, 2021 15:47
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
On Thu, Sep 2, 2021 at 10:02 AM demerphq <> wrote:

> I was rereading this and I thought of something to add here. Part of the
> confusion with Perl strings is that we try to hide the flag. We dont really
> want people to look at it and think about it. Instead we  provide a handful
> of verbs which can be used to force the string to the shape we want, or
> throw an error if we cant  (or sometimes be a no-op).
> I mean, if I want to be sure i have a latin-1 string then i would do
> something like:
> eval { utf8::downgrade($str); 1 } or warn "Cant downgrade string!";
> And if want to be user I have a utf8 string then I would do something like:
> utf8::upgrade($str);
> I wonder if we made accessing the flag state more socially acceptable
> whether people would find this less confusing.

This is fine and I often recommend use of these functions to workaround
broken abstractions (in Perl, XS or user code mistakenly using the utf8
flag). The problem is relying on the flag state for things it does not
represent, and propagating such issues.

As a side note, latin-1 is a convenient way to refer to downgraded strings
but since we are discussing internals it's important to note that they are
not specifically latin-1 strings, any more than upgraded strings are
specifically Unicode strings. A downgraded string may only consist of
ordinals in the byte range due to being stored that way, but what those
byte ordinals represent (if they even represent bytes) is up to what the
string is used for and whether the unicode_strings feature is in effect.
latin-1 mostly works as a description because the latin-1 code space maps
exactly to the first 255 codepoints of Unicode.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About