develooper Front page | perl.perl5.porters | Postings from August 2017

upgrading a qr// SV to utf8?

Thread Next
Dave Mitchell
August 3, 2017 15:58
upgrading a qr// SV to utf8?
Message ID:
What should sv_utf8_upgrade() do when when passed an SVt_REGEXP SV?

E.g. in the following code:

    my $s = "xxx";
    my $qr = qr/X/i;
    use Devel::Peek;
    Dump $qr;
    Dump $qr;
    print "matched\n" if $s =~ $qr;

In 5.16 and earlier, SVt_REGEXP had the POK flag set, and utf8::upgrade()
just set the UTF8 flag on the SV, which was probably wrong (the regex
claims to be utf8, but was compiled when not utf8).

The code above matches.

In 5.18 through to 5.27.2, the the SVt_REGEXP SV didn't have the POK flag
set, and utf8::upgrade()  converts it into a plain PVMG UTF8 PV string
"(?^i:X)", blessed into class "Regexp".

The code above no longer matches, because $qr is now just a reference to
an object and gets stringified as ""Regexp=SCALAR(0x20f6cf8)"

With v5.27.2-30-gdf6b4bd, "give REGEXP SVs the POK flag again",
utf8::upgrade() again just sets the UTF8 flag on the SVt_REGEXP SV.
However the code above still doesn't match, because in S_find_byclass(),
there is this chunk of code, with the FOLDEQ_S2_ALREADY_FOLDED added by
Karl with v5.15.3-403-g77a6d85:

    case EXACTFU:
        if (is_utf8_pat || utf8_target) {
            utf8_fold_flags = is_utf8_pat ? FOLDEQ_S2_ALREADY_FOLDED : 0;
            goto do_exactf_utf8;

Because is_utf8_pat is now determined by the SVf_UTF8 flag on the regexp
SV, this code thinks the exact node has already been folded, so fails to
match "x" against /X/i.

This is not academic; my v5.27.2-30-gdf6b4bd has broken Tk (see RT
#131821). That has some weird code in it to get the string value of an SV:

    static char *
    LangString(SV *sv)
       if (SvROK(sv))
         SV *rv = SvRV(sv);
         if (SvOBJECT(rv))
           if (SvTYPE(rv) == SVt_PVHV)
           else if (SvPOK(rv))
             /* ref to string is special cased for some reason ? */
             if (!SvUTF8(rv))
             return SvPV_nolen(rv);

Turning the POK flag back on on REGEXPs is causing LangString() to
call sv_utf8_upgrade(rv), which is setting the UTF8 flag and breaking the

A problem shared is a problem doubled.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About