develooper Front page | perl.perl5.porters | Postings from March 2021

Re: Perl 7: Fix string leaks?

Thread Previous | Thread Next
Felipe Gasper
March 30, 2021 18:53
Re: Perl 7: Fix string leaks?
Message ID:

> On Mar 30, 2021, at 12:16 PM, Salvador Fandiño <> wrote:
> On 30/3/21 16:39, Felipe Gasper wrote:
>>> On Mar 30, 2021, at 10:26 AM, Salvador Fandiño <> wrote:
>>> On 30/3/21 12:45, Felipe Gasper wrote:
>>>> The fix here is to switch the typemap to SvPVbyte so that identical Perl strings will yield identical C representation.
>>> Any XS code that is using SvPV to convert SVs to char* is already broken.
>>> IMO, the default typemap could be changed right now.
>> Wouldn’t that break a great many applications which currently pass decoded strings to XSUBs?
> You mean ensuring from the Perl side that any SV has the UTF8 flag set before passing it to some XSUB, right?

Not quite.

From your comments it sounds like you would map `char *` to `SvPVutf8_nolen`. That would break apps that either pre-encode their strings before giving them to XSUBs or that skip character decoding.

Alternatively, if you mean--as I propose--making `char *` map to `SvPVbyte_nolen`, that will break apps that *don’t* pre-encode their strings.

The status quo--SvPV_nolen--kind of serves both use cases, but unreliably so: it’s possible for a decoded string to be downgraded, and it’s possible for an encoded string to be upgraded. In either of those cases, SvPV will probably not yield the desired C string.

> No solution is trivial or evident, and would have required investigation from the developer. So, I would expect most people did find about 2 and used it.

A lot of XS modules use SvPV without checking SvUTF8. Alas.

> Also, if you make the default typemap croak if the data can not be encoded, that would make any broken code very easy to detect. Any programmer which have adopted solution 1, would find pretty soon that something is broken in his code.

SvPVbyte will make it easy to see what’s broken, sure, but someone will still need to go in and fix it. That “someone” could be the programmer who hates Perl and keeps hounding management to approve a rewrite in some other  more popular language. If it suddenly breaks, that case is much easier to make.

> Encoding/decoding should be done at the boundaries (syscall, XS, etc.) using sensible defaults and/or allowing the user to set them (as in PerlIO). That would IMO fix most of the problems.

“Encoding at the boundaries” is essentially what Sys::Binmode achieves for POSIX OSes, FWIW.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About