On Thu, Sep 2, 2021 at 10:02 AM demerphq <demerphq@gmail.com> wrote: > > I was rereading this and I thought of something to add here. Part of the > confusion with Perl strings is that we try to hide the flag. We dont really > want people to look at it and think about it. Instead we provide a handful > of verbs which can be used to force the string to the shape we want, or > throw an error if we cant (or sometimes be a no-op). > > I mean, if I want to be sure i have a latin-1 string then i would do > something like: > > eval { utf8::downgrade($str); 1 } or warn "Cant downgrade string!"; > > And if want to be user I have a utf8 string then I would do something like: > > utf8::upgrade($str); > > I wonder if we made accessing the flag state more socially acceptable > whether people would find this less confusing. > This is fine and I often recommend use of these functions to workaround broken abstractions (in Perl, XS or user code mistakenly using the utf8 flag). The problem is relying on the flag state for things it does not represent, and propagating such issues. As a side note, latin-1 is a convenient way to refer to downgraded strings but since we are discussing internals it's important to note that they are not specifically latin-1 strings, any more than upgraded strings are specifically Unicode strings. A downgraded string may only consist of ordinals in the byte range due to being stored that way, but what those byte ordinals represent (if they even represent bytes) is up to what the string is used for and whether the unicode_strings feature is in effect. latin-1 mostly works as a description because the latin-1 code space maps exactly to the first 255 codepoints of Unicode. -DanThread Previous | Thread Next