Front page | perl.perl5.porters |
Postings from February 2022
Re: RFC: Rename the “UTF8” flag
From: Felipe Gasper
February 3, 2022 12:40
Re: RFC: Rename the “UTF8” flag
Message ID: B0968D31-E687-4A96-91DA-B9F22DF47835@felipegasper.com
> On Feb 1, 2022, at 09:58, Leon Timmermans <email@example.com> wrote:
> I really don't think things will be more clear if things suddenly have two names. I don't quite see the correctness advantages either. I don't think this new terminology properly differentiates things either.
As I wrote in response to Dave: no one thing will have 2 names. Different things will have different names, rather than our status-quo of applying 1 name to 2 different things, one of which happens, as a matter of implementation, to resemble the other.
Re correctness: “heavy” to describe an encoding is proposed as a Perl-specific term to describe a Perl-specific encoding. It is correct by definition because that is its very definition. “UTF-8” is manifoldly incorrect to describe Perl internals. Perl *allows* callers (whether C or Perl) to be incorrect in the same way, but whether they embrace that freedom or not is their affair.
> I'm genuinely puzzled why you would think "heavy" would be a good name for internal-facing stuff; I had assumed you meant things the other way around. Internal-facing stuff is exactly what should care about internal encoding.
No argument there, but that encoding is still available to the internals regardless of its name, right?
> Also, the entire problem with the perlapi is that there is very little distinction between external-facing and internal-facing interfaces; the external ones are also used internally, and often the other way around too. I don't think this distinction that you think you see exists in reality to the extent that you are assuming it exists (unfortunately).
If I’m understanding you correctly, you’re observing that many XS modules use Perl’s API (and internal interfaces) such that those modules need to care about Perl’s internals. No argument there.
What I assert, though, is that a *new* XS module can, using controls available now, abstract over those internals. Thus, it’s inaccurate to say that, at the C level, it is always *necessary* to know Perl’s internal encoding.
> Would you mind explaining further where/when it is necessary to probe the string-storage abstraction?
> Literally every string-builtin in core. I'm guessing Yves' resistance to this is rooted in his extensive experience in the regex engine for example.
If I’m understanding you correctly, you mean that Perl maintainers need to know the string internals. Again, no argument; the regexp engine definitely needs to know what’s there.
What I meant, though, is: where/when would a *new* XS module absolutely need to care about a string’s internal encoding?
> In my own experience, FWIW, it is entirely possible when using Perl’s C API to avoid assumptions about how an SV stores its code points internally. Yes, you can call SvUTF8* macros and such, but the safer approach of doing actual encode/decode operations and SvPVbyte/SvPVutf8 preserves the abstraction.
> In XS code, generally yes. In core, absolutely not.
Agreed, of course. But the claim--as I understood it, anyhow--was that anything in C needs to know the internal string encoding. Maybe I misunderstood, and/or myself miscommunicated, as I seems there’s agreement that XS authors should not need that knowledge.