develooper Front page | perl.perl5.porters | Postings from October 2016

Re: Encode.xs and sv_force_normal

Thread Previous | Thread Next
From:
Father Chrysostomos
Date:
October 16, 2016 19:34
Subject:
Re: Encode.xs and sv_force_normal
Message ID:
9EB62F7E-CEA9-47B8-BD1A-2CD7A6481380@cpan.org

On Monday 10 October 2016 18:59:56 pali@cpan.org wrote:
> On Thursday 29 September 2016 21:22:26 pali@cpan.org wrote:
>> Hi!
>> 
>> Look at cpan/Encode/Encode.xs for XS functions. Every one has different
>> check if first argument can be modified or not. Some functions have
>> SvPOK() check, some SvREADONLY(), SvSMAGICAL(src), SvGMAGICAL() and some
>> SvIsCOW().
>> 
>> After these check every XS function calling either sv_force_normal() or
>> sv_mortalcopy() or sv_newmortal()+sv_copypv(). And some none.
>> 
>> In apidoc is documented macro SvTHINKFIRST() which I understand is that
>> which needs to be called for checking if sv_force_normal() is needed or
>> not.
>> 
>> I believe that for calling sv_utf8_upgrade() is needed to call
>> sv_mortalcopy() first (to prevent modification of input scalar).
>> 
>> Can some expert look at that code if this is enough to make sv argument
>> modifiable?
>> 
>> if (SvTHINKFIRST(sv)) sv_force_normal(sv)
>> 
>> I think because of this problem in Encode (which is in core) there are
>> bugs like these:
>> 
>> https://rt.cpan.org/Public/Bug/Display.html?id=117158
>> https://github.com/dankogai/p5-encode/pull/66
>> 
>> (PS: please CC me as I'm not subscribed to this high volume list)
> 
> Hi! I prepared patch for Encode which should fix those problems with
> readonly, magic, COW, ... scalars. Can you look at it? It is correct now?

Here are some observations concerning your patch.  I do not have time to confirm that my observations are correct.  Sorry.

> From f70e78996c5101a32e773c54048c7b596b744109 Mon Sep 17 00:00:00 2001
> From: Pali <pali@cpan.org>
> Date: Mon, 10 Oct 2016 18:55:56 +0200
> Subject: [PATCH] Fix handling of undef, ref, typeglob, UTF8, COW and magic
>  scalar argument in all XS functions
> 
> Before this patch every function XS function did it differently and not
> every one correctly. Now SvTHINKFIRST() check is used if sv_force_normal()
> is needed to call. SvPV() is called before every SvUTF8() check. Next
> sv_mortalcopy() before sv_utf8_upgrade/downgrade. And SvPV_force() before
> sv_utf8_downgrade(). SvEND() was replaced by calculated length. Function
> encode_method() now does not call get magic on src.

If I understand correctly that encode_method is a private function that is only supposed to handle SVs whose get-magic has already been invoked by the calling functions, then that change sounds correct.

> 
> This fixes bugs:
> https://rt.cpan.org/Public/Bug/Display.html?id=117158
> https://rt.cpan.org/Public/Bug/Display.html?id=85489
> https://github.com/dankogai/p5-encode/pull/66
> ---
>  Encode.xs   |  151 +++++++++++++++++++++++++++--------------------------------
>  MANIFEST    |    1 +
>  t/decode.t  |   54 ++++++++++++++++++++-
>  t/rt85489.t |   48 +++++++++++++++++++
>  t/utf8ref.t |    4 +-
>  5 files changed, 173 insertions(+), 85 deletions(-)
>  create mode 100644 t/rt85489.t
> 
> diff --git a/Encode.xs b/Encode.xs
> index 8c990ea..efaf228 100644
> --- a/Encode.xs
> +++ b/Encode.xs
> @@ -109,7 +109,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src,
>  	      SV *fallback_cb)
>  {
>      STRLEN slen;
> -    U8 *s = (U8 *) SvPV(src, slen);
> +    U8 *s = (U8 *) SvPV_nomg(src, slen);
>      STRLEN tlen  = slen;
>      STRLEN ddone = 0;
>      STRLEN sdone = 0;
> @@ -522,21 +522,27 @@ PREINIT:
>      int check;
>  CODE:
>  {
> -    dSP; ENTER; SAVETMPS;
> -    if (src == &PL_sv_undef || SvROK(src)) src = sv_2mortal(newSV(0));
> +    dSP;
> +    if (!SvOK(src))
> +        XSRETURN_UNDEF;

(This is in Method_decode_xs.)

What if the src is magical (e.g., tied)?  It may be !SvOK now but SvOK after calling SvGETMAGIC, so SvGETMAGIC should come first, in which case any SvPV that fallows should be SvPV_nomg.

(Arguably, any library function [as opposed to a built-in Perl operator] could freely call get-magic multiple times.  Calling it only once should be done for the sake of efficiency, not for the sake of correct behaviour.  However, if a scalar changes its internal type between multiple calls to get-magic and the XS function stores information about the scalar in separate variables, then multiple calls to get-magic can result in buggy behaviour.  It is usually much easier to make sure it is called only once.)

>      check = SvROK(check_sv) ? ENCODE_PERLQQ|ENCODE_LEAVE_SRC : SvIV(check_sv);

Similarly, check_sv could be a tied variable, so SvGETMAGIC should be called before the SvROK check.  And then SvIV should be SvIV_nomg.  (I know, I am pedantic.)

> -    if (!(check & ENCODE_LEAVE_SRC) && SvIsCOW(src)) {
> -        /*
> -         * disassociate from any other scalars before doing
> -         * in-place modifications
> -         */
> +    if (check && !(check & ENCODE_LEAVE_SRC) && SvTHINKFIRST(src))

I do not understand why you add ‘check &&’ here.  Previously, the absence of the ENCODE_LEAVE_SRC flag would cause sv_force_normal to be called.  Now, the absence of said flag causes it only if some other flag is set.

sv_force_normal will cause any RV to become undefined.  It will not be stringified.  (SvTHINKFIRST + sv_force_normal is intended for scalars that are going to be overwritten, which does not necessarily apply here.)

The previous behaviour of Encode was to treat an RV as undef but leave the original scalar intact (regardless of ENCODE_LEAVE_SRC).  The new behaviour, with your patch, is to stringify a copy of the RV passed as an argument if ENCODE_LEAVE_SRC is specified and to undefine the caller’s RV otherwise.  Is this change in behaviour intentional?

You might want to do SvPV_force_nomg here instead of sv_force_normal.  That will forcibly stringify the scalar and dissociate any COWs from other scalars sharing the same string buffer.  But it will change the behaviour with regard to RVs (a change for the better, in my opinion, but that is not my call).

Also, sv_force_normal will croak on read-only variables (SvTHINKFIRST checks SVf_READONLY).  Is that intended behaviour?

>          sv_force_normal(src);
> +    SvPV_nolen(src); /* SvPV() can modify SvUTF8 flag */

This calls get-magic again, which should have been called above.

> +    if (SvUTF8(src)) {
> +        if (!check || check & ENCODE_LEAVE_SRC)

Again, why the !check?

> +            src = sv_mortalcopy(src);
> +        if (!SvPOKp(src)) /* sv_utf8_downgrade() does not work without SvPOKp flag */
> +            SvPV_force_nomg_nolen(src);

If an RV makes it through to this code, then string overloading may be triggered twice.  Do you care?

> +        sv_utf8_downgrade(src, FALSE);
>      }
> -    s = (U8 *) SvPV(src, slen);
> -    e = (U8 *) SvEND(src);
> +    s = (U8 *) SvPV_nomg(src, slen);
> +    e = s+slen;
> +

I have only skimmed the changes made to the other functions, but they all seem very similar.



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About