develooper Front page | perl.perl5.porters | Postings from March 2021

Re: SvPVutf8 validity

Thread Previous
Felipe Gasper
March 22, 2021 14:53
Re: SvPVutf8 validity
Message ID:

> On Mar 22, 2021, at 8:59 AM, Tony Cook <> wrote:
> On Mon, Mar 22, 2021 at 07:35:41AM -0400, Felipe Gasper wrote:
>>> On Mar 22, 2021, at 7:23 AM, Tony Cook <> wrote:
>>> A bigger deal would be using "supers" or characters beyond U+10FFFF.
>>> But I think this type of issue should be dealt with on input - don't
>>> allow these characters into your strings in the first place, and
>>> SvPVutf8() won't return their encoded forms.
>> Most discussions I see about character encoding in Perl consider Encode::encode_utf8() to be improper/less-than-ideal because it outputs “lax” UTF-8. (’s own docs, for example.) Given that SvPVutf8 is essentially the same encoding logic, would the same propriety apply?
> The only ways that I'm aware of to get non-Unicode UTF-8 bytes out of
> perl's internal encoding is to:
> a) mark invalid UTF-8 bytes as UTF-8 (possibly via the current :utf8
> layer) (which makes your SV invalid)
> b) put non-Unicode code points in your string.

That makes sense. Should Perl’s documentation, then, have a different message re encoding to “utf8” than does? seems to discourage encode("utf8") with a vigour equal to that for decode("utf8").

Basically: if says it’s bad/discouraged to encode_utf8(), should the same rationale apply to SvPVutf8?

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About