develooper Front page | perl.perl5.porters | Postings from March 2021

Re: SvPVutf8 validity

Thread Previous | Thread Next
Tony Cook
March 22, 2021 13:00
Re: SvPVutf8 validity
Message ID:
On Mon, Mar 22, 2021 at 07:35:41AM -0400, Felipe Gasper wrote:
> > On Mar 22, 2021, at 7:23 AM, Tony Cook <> wrote:
> > A bigger deal would be using "supers" or characters beyond U+10FFFF.
> > 
> > But I think this type of issue should be dealt with on input - don't
> > allow these characters into your strings in the first place, and
> > SvPVutf8() won't return their encoded forms.
> Most discussions I see about character encoding in Perl consider Encode::encode_utf8() to be improper/less-than-ideal because it outputs “lax” UTF-8. (’s own docs, for example.) Given that SvPVutf8 is essentially the same encoding logic, would the same propriety apply?

The only ways that I'm aware of to get non-Unicode UTF-8 bytes out of
perl's internal encoding is to:

a) mark invalid UTF-8 bytes as UTF-8 (possibly via the current :utf8
layer) (which makes your SV invalid)

b) put non-Unicode code points in your string.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About