On Mon, Oct 23, 2017 at 11:34:16PM +0200, Lukas Mai wrote: > I think it would be a good idea to use compiler intrinsics for overflow > checks where available. > > I've pushed a branch that implements this at mauke/overflow: > https://perl5.git.perl.org/perl.git/shortlog/refs/heads/mauke/overflow > > Dave: I've CC'd you directly because you last worked on this code (commit > 230ee21f3e366901ce5769d324124c522df7ce8a, "faster add, subtract, multiply"). > > My changes affect pp_add, pp_subtract, and pp_multiply. I think the new code > is nicer because it's easier to understand than all the low-level bit > fiddling, and it passes all tests on my machine. However, I haven't done any > benchmarks to see how it affects performance (if at all). > > Things I need help with: > > - code review Note that Reini did something similar in cperl, although he disabled my short-cut code in the presence of __builtin_mul_overflow etc, which was a mistake (and is why the nbody benchmark runs about 30% faster on perl compared with cperl - at least last time I looked). At a cursory inspection the code looks good (although I haven't looked closely at the main body (non-shortcut) part of the code. Note that your '#ifdefs' probably need indenting with '# ifdef' in some places since there's already ifdefs surrounding the code. > - benchmarks (compared to a17768d7c7b82c136fbeacd85db3451973a8007a) Are you familiar with Porting/bench.pl and t/perf/benchmarks? I tried running it as follows: I had 3 executables: /tmp/perl-a1776 - just before your 3 commits /tmp/perl-of - the tip of your branch /tmp/perl-no-of - ditto with this diff applied: $ diff -u config.h- config.h +#if 0 #define HAS_BUILTIN_ADD_OVERFLOW /**/ #define HAS_BUILTIN_SUB_OVERFLOW /**/ #define HAS_BUILTIN_MUL_OVERFLOW /**/ +#endif run the arith subset of the benchmarks against the 3 perls and write the results to a file: use 8 CPUs in parallel: $ perl Porting/bench.pl -w /tmp/bm_num -j 8 --tests=/expr::arith::/ -v \ /tmp/perl-a1776 /tmp/perl-no-of /tmp/perl-of read the results back and display them sorting by number of conditional branches in the right-most column: $ perl Porting/bench.pl -r /tmp/bm_num --sort=COND:-1 > /tmp/bm_num.out You can similarly do --sort=Ir:-1 etc On my hardware, this shows that the no-builtins build shows no slowdown (good!) and the builtins build shows a modest improvement in the number instruction reads and/or conditional branches (again, good). Here are a couple the best results: expr::arith::add_lex_ii add two integers and assign to a lexical var /tmp/perl-a1776 /tmp/perl-no-of /tmp/perl-of --------------- --------------- ------------ Ir 100.00 100.00 106.19 Dr 100.00 100.00 100.00 Dw 100.00 100.00 100.00 COND 100.00 100.00 100.00 IND 100.00 100.00 100.00 expr::arith::add_lex_ss add two short strings and assign to a lexical var /tmp/perl-a1776 /tmp/perl-no-of /tmp/perl-of --------------- --------------- ------------ Ir 100.00 100.00 101.96 Dr 100.00 100.00 100.00 Dw 100.00 100.00 100.00 COND 100.00 100.00 104.30 IND 100.00 100.00 100.00 And here's the average. It includes many non add/sub/mult benchmarks, which dilutes the numbers. AVERAGE /tmp/perl-a1776 /tmp/perl-no-of /tmp/perl-of --------------- --------------- ------------ Ir 100.00 100.00 101.01 Dr 100.00 100.00 100.00 Dw 100.00 100.00 100.00 COND 100.00 100.00 100.22 IND 100.00 100.00 100.00 -- I before E. Except when it isn't.Thread Previous | Thread Next