develooper Front page | perl.perl5.porters | Postings from March 2012

Re: [perl #107008] UTF8 patches for 5.16

Thread Previous | Thread Next
From:
Brian Fraser
Date:
March 25, 2012 16:51
Subject:
Re: [perl #107008] UTF8 patches for 5.16
Message ID:
CA+nL+nY3HggPV1WMnEvia9FjqFR43REEt=PG35Lz=60XoL9Udw@mail.gmail.com
On Sun, Mar 25, 2012 at 6:07 PM, Father Chrysostomos via RT <
perlbug-followup@perl.org> wrote:

> On Wed Mar 21 18:49:00 2012, Hugmeir wrote:
> > On Wed, Mar 21, 2012 at 10:04 PM, Father Chrysostomos via RT <
> > perlbug-followup@perl.org> wrote:
> >
> > > On Sun Dec 25 14:33:33 2011, sprout wrote:
> > > I still don’t understand why parse_label is causing problems.
> > Zefram,
> > > can you look into that?
> > >
> >
> > For a quick recap, this is how Father C spotted this:
> >
> > $ PERL_DESTRUCT_LEVEL=2 ./perl -Ilib ext/XS-APItest/t/swaplabel.t
> > 1..56
> > ok 1
> > ... truncated ...
> > ok 56
> > panic: free from wrong pool during global destruction.
> >
> > The problem is in these lines of parse_label:
> >
> > lsv = newSV_type(SVt_PV);
> > SvPV_set(lsv, lpv);
> > SvCUR_set(lsv, llen);
> >
> > And it goes away if you do this:
> >
> > lsv = newSV_type(SVt_PV);
> > SvPV_set(lsv, pl_yylval.opval->op_type == OP_CONST ? savepv(lpv) :
> > lpv);
> > SvCUR_set(lsv, llen);
> >
> > But I don't have a clue why! So that's definitely not ready for
> > inclusion.
> > I dimly recall that the normal label parsing (with normal, I mean the
> > one
> > that doesn't use parse_label) takes ownership of the SV's PV slot, but
> > I'm
> > nowhere near knowledgeable enoughto figure out if that applies to
> > parse_label too, or why it only shows up with PERL_DESTRUCT_LEVEL=2.
>
> OK, I’ve figured it out.
>
> In your (Brian Fraser’s) label patch, you make this change in the way
> the label is stored by the tokeniser:
>
> -           pl_yylval.pval = CopLABEL_alloc(PL_tokenbuf);
> +           pl_yylval.opval = (OP*)newSVOP(OP_CONST, 0,
> +                                            newSVpvn_flags(PL_tokenbuf,
> +                                                        len, UTF ?
> SVf_UTF8 : 0));
>
> So it is no longer just an allocated PV with pl_yylval pointing to it,
> but an svop, which, of course, owns the SV.
>
> So, in perly.y, you make this change:
>
>  labfullstmt:   LABEL barestmt
>                        {
> -                         $$ = newSTATEOP(0, PVAL($1), $2);
> +                         $$ = newSTATEOP_flags(0,
> savepv(SvPVX(((SVOP*)$1)->op_sv)),
> +
> SvUTF8(((SVOP*)$1)->op_sv), $2);
>                          TOKEN_GETMAD($1,
>                              $2 ? cLISTOPx($$)->op_first : $$, 'L');
>                        }
>        |       LABEL labfullstmt
>                        {
> -                         $$ = newSTATEOP(0, PVAL($1), $2);
> +                         $$ = newSTATEOP_flags(0,
> savepv(SvPVX(((SVOP*)$1)->op_sv)),
> +
> SvUTF8(((SVOP*)$1)->op_sv), $2);
>                          TOKEN_GETMAD($1, cLISTOPx($$)->op_first, 'L');
>                        }
>
> where the standard parser *copies* (savepv) the strings, so that the
> svop doesn’t free the string used by the new stateop.
>
> In parse_label, you changed it to get the string out of the svop, but
> without adding the copying step:
>
> -           char *lpv = pl_yylval.pval;
> -           STRLEN llen = strlen(lpv);
> +           STRLEN llen;
> +           char *lpv = SvPV(cSVOPx(pl_yylval.opval)->op_sv, llen);
>
> So it is parse_label that needs to change, but not because of anything
> Zefram did.
>


*Thank you* for tracking this down, not to mention all the rest of your
work putting the rest of my mess into shape. I'm currently out with a cold
and covered under a mountain of work, so I'm slightly more useless than
usual; I apologize for leaving you hanging.

I agree with leaving the \N{...} fix out, merely because it's already
fairly late in the release cycle and this isn't a regression.

The UVX patch can be lost forever, really. The dump.c thread has a
discussion of how to best escape things, and if the format for these error
messages ever changes, that'll probably be the way to go.

Problem with the numeric warning patch is that it might break depending on
your locale, maybe. My understanding was that isPRINT_LC(ch) will always
return false for octets above the ASCII range. I'm out of my water here,
though.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About