develooper Front page | perl.perl5.porters | Postings from March 2021

Re: SvPVutf8 validity

Thread Previous | Thread Next
From:
Tony Cook
Date:
March 22, 2021 13:00
Subject:
Re: SvPVutf8 validity
Message ID:
20210322125922.GF13843@venus.tony.develop-help.com
On Mon, Mar 22, 2021 at 07:35:41AM -0400, Felipe Gasper wrote:
> > On Mar 22, 2021, at 7:23 AM, Tony Cook <tony@develop-help.com> wrote:
> > A bigger deal would be using "supers" or characters beyond U+10FFFF.
> > 
> > But I think this type of issue should be dealt with on input - don't
> > allow these characters into your strings in the first place, and
> > SvPVutf8() won't return their encoded forms.
> 
> Most discussions I see about character encoding in Perl consider Encode::encode_utf8() to be improper/less-than-ideal because it outputs “lax” UTF-8. (Encode.pm’s own docs, for example.) Given that SvPVutf8 is essentially the same encoding logic, would the same propriety apply?

The only ways that I'm aware of to get non-Unicode UTF-8 bytes out of
perl's internal encoding is to:

a) mark invalid UTF-8 bytes as UTF-8 (possibly via the current :utf8
layer) (which makes your SV invalid)

b) put non-Unicode code points in your string.

Tony

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About