develooper Front page | perl.perl5.porters | Postings from February 2003

[PATCH] Copy on write for $& and $1... (Re: [PATCH] 4%? (was Re: [PROTOPATCH] copy on write $&))

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
February 9, 2003 15:04
Subject:
[PATCH] Copy on write for $& and $1... (Re: [PATCH] 4%? (was Re: [PROTOPATCH] copy on write $&))
Message ID:
20030209230008.GF299@Bagpuss.unfortu.net
Following a slight delay:

On Sat, Jan 04, 2003 at 07:08:57PM +0000, Nicholas Clark wrote:
> On Sat, Jan 04, 2003 at 03:53:53PM +0000, Nicholas Clark wrote:
> > Resounding public no-comment. (I had one private response)
> > 
> > On Fri, Jan 03, 2003 at 12:14:19AM +0000, Nicholas Clark wrote:
> > > Hot off my local buildfarm (all 1 of it), here is a patch that implements
> > > copy on write for $& and $1..$n

I know where the erratic benchmarks come from - the default alignment on
x86 gcc appears to be "if a 16 byte boundary is 8 or less bytes away,
pad to it, else no padding"

Hence on these default settings, earlier code changing size by a few
bytes can cause later code to go into, or out of, the 8 byte window, and
hence critical loops can either become nicely aligned, or become nastily
misaligned. I'm now benchmarking everything with explicit 8 byte
alignment.


This is quite an aggressive copy on write patch. Normally if a scalar is
copy on write already, or is of type PVIV but POK only, then it sv_setsv
will do copy on write. The regexp patch makes the regexp engine also
upgrade lesser string types to PVIV if a copy for $& or $1 is going to be
needed.

Also, following Enache Adrian's patches to fix regexp substitution for his
mmap code, this patch is now able to allow the regexp engine to use copy
on write in the match part of a substitution. The upshot is that if you
do this

$perl = "rocks";
$perl =~ s/r(ock)s/ule/;

then during the regexp match for qr/(rocks)/, $perl is upgraded to PVIV,
and the regexp's copy for $1 is held as a private copy-on-write SV.
(So there's no copy of the buffer at this point)
During the substitution, the substitution operator realises that TARG
as turned COW "underneath" it, and copes - it now jumps to the
can't-modify-in-place branch, and creates a new SV to build "rules" in.

In the previous patch a flag is passed to the regexp engine telling it not
to copy on write, so that private copy for $1 has to be done the "traditional"
way - malloc a new buffer, and memcpy.

Without copy on write, the regexp engine also mallocs a buffer and copies.

There are two patches, so I'll attach them. One is the regexp patch, the other
is the inline SvREFCNT_dec patch, which seems to be needed to make the
COW regexps faster. I'm not sure whether regexp.h should have this:

#ifdef PERL_COPY_ON_WRITE
#define RX_MATCH_COPY_FREE(rx) \
	STMT_START {if (rx->saved_copy) { \
	    SV_CHECK_THINKFIRST_COW_DROP(rx->saved_copy); \
	} \
	if (RX_MATCH_COPIED(rx)) { \
	    Safefree(rx->subbeg); \
	    RX_MATCH_COPIED_off(rx); \
	}} STMT_END
#else
#define RX_MATCH_COPY_FREE(rx) \
	STMT_START {if (RX_MATCH_COPIED(rx)) { \
	    Safefree(rx->subbeg); \
	    RX_MATCH_COPIED_off(rx); \
	}} STMT_END
#endif

which is more frugal on memory, but does more work, or just this:

#define RX_MATCH_COPY_FREE(rx) \
	STMT_START {if (RX_MATCH_COPIED(rx)) { \
	    Safefree(rx->subbeg); \
	    RX_MATCH_COPIED_off(rx); \
	}} STMT_END

(latter is the Amp6- in the results below)

Perlbench output looks like this:

A,L) perl-5.009
        path        = /export/home/nwc10/18675/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

B,J) perl-5.009
        path        = /export/home/nwc10/18675-dec/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

C,I) perl-5.009
        path        = /export/home/nwc10/18675-COW-dec/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DPERL_COPY_ON_WRITE -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

D,H) perl-5.009
        path        = /export/home/nwc10/18675-dec-COW-Amp6-/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DPERL_COPY_ON_WRITE -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

E,G) perl-5.009
        path        = /export/home/nwc10/18675-COW-Amp6/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DPERL_COPY_ON_WRITE -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

F,K) perl-5.009
        path        = /export/home/nwc10/18675-dec-COW-Amp6/perl
        cc          = ccache gcc
        optimize    = -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686
        ccflags     = -DPERL_COPY_ON_WRITE -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -I/usr/local/include
        usemymalloc = n

                        A    L    B    J    C    I    D    H    E    G    F    K
                      ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
arith/mixed           100  100   98   98   98   98  104  104  104  104  104  104
arith/trig            100  100   87   87   92   92   95   94  102  102   93   93
array/copy            100  100  101  101  102  102  101  101  101  101  101  100
array/foreach         100  100   97   97  104  104  102  102  101  101  100  100
array/index           100  100   96   96  100  100  101  101  100  100  100  100
array/pop             100  100  101  101  103  103  102  102   98   98  101  101
array/shift           100  100   99   99  101  101  100  100   98   98  100  100
array/sort-num        100  100  101  101  100  100  100  100   95   95   99   99
array/sort            100  100  100  100  100  100   99   99   96   96  100  100
call/0arg             100  100   96   95  100   99  102  103   97   97  102  102
call/1arg             100  100  102  102  101  102  105  105   95   95  104  104
call/2arg             100  100  101  101   98  100  103  103   96   96  103  103
call/9arg             100  100   90   90   96   96   97   97   94   94   97   97
call/empty            100  100  100  100   98   98   98   98  100  100   98   98
call/fib              100   99   98   98   99   99  101  100   89   89  100  100
call/hash             100  100  100  100  103  103  105  105  103  103  105  105
call/method           100  100   87   87   90   90  100  100   91   91  100  100
call/wantarray        100  100  100  100  101  100  101  101  104  104  102  102
hash/bigcopy          100  100   98   98   97   97   97   97   99   99   98   97
hash/copy             100  100   98   98  100  100  101  101  103  102  100  100
hash/each             100  100   96   96  101  101   92   92   93   93   92   92
hash/foreach-sort     100  100   99   99   98   98   97   97   99   99   98   98
hash/foreach          100  100   95   95   98   98   98   99   99   99   98   98
hash/get              100  100   94   94  100  100   98   98  101  101   97   97
hash/set              100  100   93   93   97   97   99   99  100  100   99   99
loop/for-c            100  100   95   95  105  105  106  106  105  104  106  107
loop/for-range-const  100  100  100  100   99   99   99   99   99   99   99   99
loop/for-range        100  100   91   91  100  100  100   99   99   99   99   99
loop/getline          100  100   93   93   91   91   98   97   94   94  100   99
loop/while-my         100  100   84   84   96   97   95   95   98   97   95   95
loop/while            100  100   89   89   99   99   99   99   97   97   99   99
re/const              100  100   96   96  100  100  100  100   95   95   97   97
re/w                  100  100   94   94   97   97  121  121  117  117  122  123
startup/fewmod        100  100  100  100   99   99  100  100  101  101  100  100
startup/lotsofsub     100  100  101  100   99   99   99   99  103  103  100  100
startup/noprog        100  100   99   99  100  100  100   99  100  100  100   99
string/base64         100  100   99   99   99   99   98   98   95   95   98   98
string/htmlparser     100  100   99   98   94   93   91   91   93   93   91   91
string/index-const    100  100  102  102  101  101  101  101  102  102  102  101
string/index-var      100  100   95   95  100  100  101  100  100  100  100  100
string/ipol           100  100   96   96  102  102  103  103  103  102  102  103
string/tr             100  100  100  100   95   95  103  103   95   96   97   97

AVERAGE               100  100   97   97   99   99  100  100   99   99  100  100

So the COW regexps are the same speed as vanilla perl for both choices
for the macro RX_MATCH_COPY_FREE, but without the SvREFCNT_dec patch COW
is slower.

I think that this is as good as it gets. If someone has a benchmark that does
serious amounts of regexp work, particularly captures, that would be
interesting.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About