develooper Front page | perl.perl5.porters | Postings from March 2021

Re: SvPVutf8 validity

Thread Previous
From:
Felipe Gasper
Date:
March 22, 2021 14:53
Subject:
Re: SvPVutf8 validity
Message ID:
7D093D3E-00DA-42C1-BF20-A1EAC5BA9F8F@felipegasper.com


> On Mar 22, 2021, at 8:59 AM, Tony Cook <tony@develop-help.com> wrote:
> 
> On Mon, Mar 22, 2021 at 07:35:41AM -0400, Felipe Gasper wrote:
>>> On Mar 22, 2021, at 7:23 AM, Tony Cook <tony@develop-help.com> wrote:
>>> A bigger deal would be using "supers" or characters beyond U+10FFFF.
>>> 
>>> But I think this type of issue should be dealt with on input - don't
>>> allow these characters into your strings in the first place, and
>>> SvPVutf8() won't return their encoded forms.
>> 
>> Most discussions I see about character encoding in Perl consider Encode::encode_utf8() to be improper/less-than-ideal because it outputs “lax” UTF-8. (Encode.pm’s own docs, for example.) Given that SvPVutf8 is essentially the same encoding logic, would the same propriety apply?
> 
> The only ways that I'm aware of to get non-Unicode UTF-8 bytes out of
> perl's internal encoding is to:
> 
> a) mark invalid UTF-8 bytes as UTF-8 (possibly via the current :utf8
> layer) (which makes your SV invalid)
> 
> b) put non-Unicode code points in your string.

That makes sense. Should Perl’s documentation, then, have a different message re encoding to “utf8” than Encode.pm does? Encode.pm seems to discourage encode("utf8") with a vigour equal to that for decode("utf8").

Basically: if Encode.pm says it’s bad/discouraged to encode_utf8(), should the same rationale apply to SvPVutf8?

-F
Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About