develooper Front page | perl.perl5.porters | Postings from January 2013

Re: use of LIKELY() and UNLIKELY() branch predictors

Thread Previous | Thread Next
From:
Steffen Mueller
Date:
January 31, 2013 10:24
Subject:
Re: use of LIKELY() and UNLIKELY() branch predictors
Message ID:
510A4651.4000000@steffen-mueller.net
On 01/30/2013 07:33 PM, Dave Mitchell wrote:
> Yeah. Just to be clear, I was pointing out the difficulties of automatic
> profiling: I expect it would usually be obvious when to apply UNLIKELY etc
> by hand.

That's an interesting point of view. Many a potential big win would be 
based on knowledge of relative frequency of occurrence of SV types. That 
isn't at all obvious. Some things are: SvMAGICAL could almost always be 
wrapped in UNLIKELY -- unless we're already in a branch that treats a 
special case. This shows that those decisions are highly context 
sensitive and thus aren't necessarily all that beginner-friendly.

Furthermore, I think that the potentially biggest wins could be gotten 
from just a couple of places, some of which are strategic. For example, 
Rafael did a fair amount of profiling at work on our main code base 
recently and found that the most time (single-most, not overall) was 
spent in UTF8 and other string-concat related functions. Thus, modifying 
SvGROW, for example

# define SvGROW(sv,len) (SvLEN(sv) < (len) ? sv_grow(sv,len) : SvPVX(sv))

to read

# define SvGROW(sv,len) (UNLIKELY(SvLEN(sv) < (len)) ? sv_grow(sv,len) : 
SvPVX(sv))

and relying on the fact that perl does aggressive geometric growth of 
strings MAY be quite beneficial[1]. Furthermore for UTF8 handling, 
there's lots of loops over characters, checking

if (UNI_IS_INVARIANT(uv))

which is:

/* Is the representation of the Unicode code point 'c' the same 
regardless of
  * being encoded in UTF-8 or not? */
#define UNI_IS_INVARIANT(c)		(((UV)c) <  0x80)

Is it a fair assumption to think that most characters we deal with are < 
0x80? For the code I write, I'm pretty sure that yes, most characters 
are UNI_IS_INVARIANT (yay, ASCII). But is it reasonable to discriminate 
this way? If so, that could be a big win.

Another fun one: What about UNI_SKIP?

#define UNISKIP(uv) ( (uv) < 0x80           ? 1 : \
		      (uv) < 0x800          ? 2 : \
		      (uv) < 0x10000        ? 3 : \
		      (uv) < 0x200000       ? 4 : \
		      (uv) < 0x4000000      ? 5 : \
		      (uv) < 0x80000000     ? 6 : \
                       (uv) < UTF8_QUAD_MAX ? 7 : 13 )

There's easy cases, too: Anything that does if(...)croak() could be 
considered unlikely because of the relative cost of exceptions. Anything 
taint related could be considered unlikely. Another judgement call: Do 
we want to slightly pessimize the already-slow taint logic and 
potentially speed up normal code execution when taint is off (but not 
compiled out)? I think yes, but it would be perfectly valid to disagree.

Finally, looking at what I think is a very hot function:

SV *Perl_newSV(pTHX_ const STRLEN len)

One could argue that if(len) should be unlikely.

Would it be beneficial to add a separate function that only allocates a 
new SV without checking whether we should reserve string space? It seems 
to me like the majority of SVs aren't born as strings, so that could be 
a similar change as SvREFCNT_dec_NN in that it saves one branch in very 
hot code.

--Steffen

[1] Which makes me wonder whether gcc would make the same assumptions 
about ternaries as it does with if(){}. Presumably yes.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About