develooper Front page | perl.perl5.porters | Postings from September 2014

[perl #122835] [PATCH] optimize pp_length for simple PVs

Thread Previous | Thread Next
From:
bulk88 via RT
Date:
September 23, 2014 22:59
Subject:
[perl #122835] [PATCH] optimize pp_length for simple PVs
Message ID:
rt-4.0.18-2997-1411513146-1853.122835-15-0@perl.org
See attached patch.

Before of pp_length on vc 2003 in machine code bytes, 0x100, after 0x102, yes, its a slight increase. There were previously 2 mg_set and 2 SETs(TARGs) in machine code. The extra initial IN_BYTES, and XORing and shifting used up the savings by factoring out the former.

I included some profiling stats of before and after. Test code was.
--------------------------------------------
my @arr = ("ARGVOUT",
"ZOO",
"ENV",
"ARGV",
"NO",
"STDOUT",
"_",
"SIG",
"INC",
"*",
"FOO",
"STDIN",
"STDERR",
"ARGV",
"INC",
"STDOUT",
"STDZER",
"IOP",
"ZIP",
"ENV",
"CAT",
"ZOOF",
"STDIN",
"ARGVOUT",
"ENV",
"ARGV",
"STDOUT",
"_",
"SIG",
"INC",
"*");
my $cnt = 0;
for(0..1000000) {
foreach(@arr) {
    $cnt+= length($_);
    $c++;
}
}
print "$cnt $c\n";
--------------------------------------------

The profiler is QueryPerformanceCounter based and is replacement runloop based, and not publically available (since upstream won't accept code that will break all OSes but Win32).

Rejected alternatives

--------------------------------------------
OP * S_lencomplex(pTHX_ SV *sv, SV * TARG) {
        STRLEN len;
        dSP;
        SvGETMAGIC(sv);
        //if(svflags & SVs_GMG)
        //    mg_get(sv);
        if (SvOK(sv)) {
            //SETs(TARG);
            if (!IN_BYTES)
    //	if (!in_bytes)
                sv_setiv(TARG, (IV)sv_len_utf8_nomg(sv));
    //	if (!IN_BYTES)
            //    SETi(sv_len_utf8_nomg(sv));
            else
            {
                /* unrolled SvPV_nomg_const(sv,len) */
                if(SvPOK_nog(sv)){
                    len = SvCUR(sv);
                } else  {
                    (void)sv_2pv_flags(sv, &len, 0|SV_CONST_RETURN);
                }
                sv_setiv(TARG, (IV)(len));
                
                
    //	    SETi(len);
            }
        } else {
            
            if (!SvPADTMP(TARG)) {
                sv_setsv_nomg(TARG, &PL_sv_undef);
    //	    SETTARG;
    /*#define SETTARG		STMT_START { SvSETMAGIC(TARG); SETs(TARG); } STMT_END */
            } else {
                SETs(&PL_sv_undef); /* TARG is on stack at this point and is overwriten
                                      this branch is the odd one out, so put TARG by default on stack */
                goto noMagicForTARG;
            }
        
    //	SETs(&PL_sv_undef);
        }
        SvSETMAGIC(TARG);
        noMagicForTARG:
        //RETURN;
        return NORMAL; /*no putback, SP didn't move in this opcode */
    }


PP(pp_length)
{
    dSP; dTARGET;
    SV * const sv = TOPs;

    U32 in_bytes = IN_BYTES;
    /* simplest case shortcut */
    /* turn off SVf_UTF8 in tmp flags if HINT_BYTES on*/
    U32 svflags = (SvFLAGS(sv) ^ (in_bytes << 26)) & (SVf_POK|SVs_GMG|SVf_UTF8);
    //assert(HINT_BYTES == 0x00000008 && SVf_UTF8 == 0x20000000 && (SVf_UTF8 == HINT_BYTES << 26));
    //U32 svflags = SvFLAGS(sv) & (SVf_POK|SVs_GMG|SVf_UTF8);
    STRLEN len;
    SETs(TARG);
    //DebugBreak();
    if(UNLIKELY(svflags != SVf_POK))
    {
    return S_lencomplex(aTHX_ sv, TARG);
    }
    
    
    
    else {
        use_SvCUR:
        len = SvCUR(sv);
        havelen:
        sv_setiv(TARG, (IV)(len));
    }
    
    SvSETMAGIC(TARG);
    noMagicForTARG:
    //RETURN;
    return NORMAL; /*no putback, SP didn't move in this opcode */
}
----------------------------------------------

only 2-3% faster (noise?) on simple_PVs, a PGO C compiler would accomplish the same thing, also a separate function would make non-simple-PVs slightly slower on paper due to the function call

----------------------------------------------
sub SVf_UTF8        { return 0x20000000};
sub SVs_GMG		{ return 0x00200000};
sub SVf_POK		{ return 0x00000400};
sub HINT_BYTES		{ return 0x00000008};

my $svflags =
0
| SVf_POK
| SVs_GMG
#| SVf_UTF8
| HINT_BYTES
;
if( $svflags == SVf_POK ||  $svflags == (SVf_POK|HINT_BYTES)
       || $svflags == (SVf_POK|SVf_UTF8|HINT_BYTES)) {
    print "Simple PV\n";
} else {
    print "Complex PV\n";
}
----------------------------------------------
U32 svflags = (SvFLAGS(sv)&SVTYPEMASK)|HINT_BYTES; 
//AKA
U32 svflags = (SvFLAGS(sv)&SVTYPEMASK)|IN_BYTES; 
----------------------------------------------

Somehow this looked not simplified enough but my bitwise math skills aren't good enough to have figured out a better way of writing this using "^/~/-/</>/!!/signed lt and gt" other than my shift and ^ above. All 3 constants contain POK, 2 contain HINT_BYTES, and complex PVs take 3 comparisons before rejection (and onto the getmagic check). use bytes takes 2 or 3 comparisons before taking the shortcut.

Timing wise, from what I've stepped, the biggest gain is fixing pad SVs and sv_setiv to avoid the sv_upgrade call on EVERY IV returning opcode, since pad vars get downgraded to SVt_NULL and their bodies are freed on every block entry/leave. Also a shortcut in sv_setiv to do the SV body ptr to sv_u member in head upgrade on SVt_NULL directly in sv_setiv without calling sv_upgrade is something i need to investigate.

-- 
bulk88 ~ bulk88 at hotmail.com

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=122835

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About