develooper Front page | perl.perl5.porters | Postings from May 2013

Re: [perl #116407] SvPVutf8 != SvPVX, and sv_2pvutf8

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
May 6, 2013 14:59
Subject:
Re: [perl #116407] SvPVutf8 != SvPVX, and sv_2pvutf8
Message ID:
20130506145920.GN2216@iabyn.com
On Wed, Jan 16, 2013 at 10:36:12AM +1100, Tony Cook wrote:
> On Tue, Jan 15, 2013 at 01:24:29PM -0800, bulk88 wrote:
> > I recently found a bug in a core module, where a loop was done over a 
> > string from SvPVutf8(sv, len) to SvEND(sv). The sv is a RO literal SV. 
> > So it looped over uninitialized memory and sometimes segvs.
> > 
> > http://perl5.git.perl.org/perl.git/blob/a8c6ff7b8e8c6037333c21f9b3f6b38b9278df4f:/sv.c#l3060
> > 
> > A mortal copy was made of the RO SV was made and returned as the result 
> > of SvPVutf8.
> > 
> > All that SvPVutf8's docs say is "Like C<SvPV>, but converts sv to utf8 
> > first if necessary."
> > 
> > SvPV says "The SV may cache the stringified version becoming C<SvPOK>." 
> > So that is the disclaimer that SvPVX after SvPV is not the same pointer 
> > as returned from SvPV. I think the docs can be improved for SvPVutf8 and 
> > maybe SvPV to make it more obvious that these 2 do not create POK SVs. I 
> > don't have any proposed wording so no patch is included.
> > 
> > This ticket might be related to 
> > http://perl5.git.perl.org/perl.git/commit/fe46cbda823c09f80e4bc48dd93fafb26cc805f6 
> > and https://rt.perl.org/rt3/Ticket/Display.html?id=108994 .
> 
> Perhaps:
> 
>   Like C<SvPV>, but returns always returns a utf8 string.
> 
> 	char*	SvPVutf8(SV* sv, STRLEN len)
> 
>   This may return a pointer to a temporary instead of C<SvPVX(sv)>.
> 
> The other SvPVutf8* macros should be updated too.

Perhaps we just need to clarify SvPV and rely on the fact that all the
others say "SvPVfoo is just like SvPV, except for foo".

I'd suggest adding the following to SvPV:

    Note that there is no guarantee that that the return value of
    C<SvPV()> is equal to C<SvPVX(sv)> (or even that C<SvPVX(sv)> contains
    valid data), due to the way things like overloading and Copy-on-write
    are handled.  In these cases, the return value may point to a
    temporary buffer or similar. If you absolutely need the SvPVX field to
    be valid (for example, if you intend to write to it), then see
    C<SvPV_force>.

And for SvPV_force, which currently says:

    Like C<SvPV> but will force the SV into containing a string
    (C<SvPOK>), and only a string (C<SvPOK_only>), by hook or by crook.
    You want force if you are going to update the C<SvPVX> directly.
    Processes get magic.

I'd suggest s/You want force/You need force/ (to make it stronger), and
add:

    Note that coercing an arbitrary scalar into a plain PV will
    potentially strip useful data from it. For example if the SV was
    C<SvROK>, then referent will have its reference count decremented, and
    the SV itself may be converted to an C<SvPOK> scalar with a string
    buffer containing  value such as C<"ARRAY(0x1234)">.


Then make the SvPV_force_nomg() doc refer to SvPV_force() rather than
duplicating all the boiler-plate just added.

Note that updating SvPV* docs like this will also help with the goal of
improving the docs vis-a-vis preparing for re-enabling COW.





-- 
A major Starfleet emergency breaks out near the Enterprise, but
fortunately some other ships in the area are able to deal with it to
everyone's satisfaction.
    -- Things That Never Happen in "Star Trek" #13

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About