develooper Front page | perl.perl5.porters | Postings from January 2020

Request for comments on commit129ccace6b45e3574c0b430b1fbcc7f8d0aa8e50, speed up grok_number

From:
Karl Williamson
Date:
January 18, 2020 20:01
Subject:
Request for comments on commit129ccace6b45e3574c0b430b1fbcc7f8d0aa8e50, speed up grok_number
Message ID:
45097e0d-5ef9-4762-da90-622ff38657a4@khwilliamson.com
I meant that to be a PR.

I sped it up as much as I could think of.  In particular do you disagree 
with any of my choices for LIKELY, UNLIKELY branch prediction.

Here's a link to the commit 
diffs:https://github.com/Perl/perl5/commit/129ccace6b45e3574c0b430b1fbcc7f8d0aa8e50

The comparison from this vs blead in cachegrind is:

Key:
     Ir   Instruction read
     Dr   Data read
     Dw   Data write
     COND conditional branches
     IND  indirect branches
     _m   branch predict miss
     _m1  level 1 cache miss
     _mm  last cache (e.g. L3) miss
     -    indeterminate percentage (e.g. 1/0)

The numbers represent raw counts per loop iteration.

ten_digits
1256908743

        blead switch Ratio %
        ----- ------ -------
     Ir 817.0  800.0   102.1
     Dr 242.0  241.0   100.4
     Dw 141.0  142.0    99.3
   COND 122.0  112.0   108.9
    IND  11.0   11.0   100.0

COND_m   1.0    1.0   100.0
  IND_m   7.0    7.0   100.0

  Ir_m1   0.0    0.0   100.0
  Dr_m1   0.0    0.0   100.0
  Dw_m1   0.0    0.0   100.0

  Ir_mm   0.0    0.0   100.0
  Dr_mm   0.0    0.0   100.0
  Dw_mm   0.0    0.0   100.0

nine_digits
124578902

        blead switch Ratio %
        ----- ------ -------
     Ir 801.0  737.0   108.7
     Dr 240.0  225.0   106.7
     Dw 140.0  134.0   104.5
   COND 119.0   98.0   121.4
    IND  11.0   11.0   100.0

COND_m   1.0    1.0   100.0
  IND_m   7.0    7.0   100.0

  Ir_m1   0.0    0.0   100.0
  Dr_m1   0.0    0.0   100.0
  Dw_m1   0.0    0.0   100.0

  Ir_mm   0.0    0.0   100.0
  Dr_mm   0.0    0.0   100.0
  Dw_mm   0.0    0.0   100.0

negative_9_digits
-124578902

        blead switch Ratio %
        ----- ------ -------
     Ir 801.0  744.0   107.7
     Dr 239.0  225.0   106.2
     Dw 141.0  135.0   104.4
   COND 118.0  100.0   118.0
    IND  11.0   11.0   100.0

COND_m   1.0    1.0   100.0
  IND_m   7.0    7.0   100.0

  Ir_m1   0.0    0.0   100.0
  Dr_m1   0.0    0.0   100.0
  Dw_m1   0.0    0.0   100.0

  Ir_mm   0.0    0.0   100.0
  Dr_mm   0.0    0.0   100.0
  Dw_mm   0.0    0.0   100.0

three_digits
123

        blead switch Ratio %
        ----- ------ -------
     Ir 735.0  687.0   107.0
     Dr 234.0  220.0   106.4
     Dw 134.0  128.0   104.7
   COND 107.0   92.0   116.3
    IND  11.0   12.0    91.7

COND_m   1.0    1.0   100.0
  IND_m   7.0    7.0   100.0

  Ir_m1   0.0    0.0   100.0
  Dr_m1   0.0    0.0   100.0
  Dw_m1   0.0    0.0   100.0

  Ir_mm   0.0    0.0   100.0
  Dr_mm   0.0    0.0   100.0
  Dw_mm   0.0    0.0   100.0

three_digits_then_garbage
123foo

        blead switch Ratio %
        ----- ------ -------
     Ir 667.0  669.0    99.7
     Dr 206.0  206.0   100.0
     Dw 107.0  108.0    99.1
   COND 100.0   95.0   105.3
    IND  10.0   11.0    90.9

COND_m   0.0    0.0   100.0
  IND_m   7.0    7.0   100.0

  Ir_m1   0.0    0.0   100.0
  Dr_m1   0.0    0.0   100.0
  Dw_m1   0.0    0.0   100.0

  Ir_mm   0.0    0.0   100.0
  Dr_mm   0.0    0.0   100.0
  Dw_mm   0.0    0.0   100.0

The ratio is somewhat better than these numbers give due to the overhead 
in using API-test.  The output of perf on most of the same data (thanks 
to Sergey Aleynikov) is

     char* foo = "1256908743";
blead
      5,404,412,751      cycles:u
     22,000,388,916      instructions:u            #    4.07  insn per cycle
      4,400,081,289      branches:u
              5,097      branch-misses:u           #    0.00% of all 
branches
origin/smoke-me/khw-grok
      5,404,110,883      cycles:u
     20,100,389,643      instructions:u            #    3.72  insn per cycle
      3,200,081,354      branches:u
              4,849      branch-misses:u           #    0.00% of all 
branches

     char* foo = "124578902";
blead
      4,942,080,391      cycles:u
     20,300,388,876      instructions:u            #    4.11  insn per cycle
      4,000,081,249      branches:u
              4,948      branch-misses:u           #    0.00% of all 
branches
origin/smoke-me/khw-grok
      3,803,232,576      cycles:u
     14,400,389,502      instructions:u            #    3.79  insn per cycle
      1,900,081,213      branches:u
              4,557      branch-misses:u           #    0.00% of all 
branches

     char* foo = "-124578902";
blead
      4,903,991,977      cycles:u
     20,300,388,874      instructions:u            #    4.14  insn per cycle
      4,000,081,247      branches:u
              4,939      branch-misses:u           #    0.00% of all 
branches
origin/smoke-me/khw-grok
      4,002,571,689      cycles:u
     15,300,389,516      instructions:u            #    3.82  insn per cycle
      2,100,081,227      branches:u
              4,381      branch-misses:u           #    0.00% of all 
branches


     char* foo = "123foo";
blead
      5,103,343,313      cycles:u
     19,500,390,463      instructions:u            #    3.82  insn per cycle
      4,400,081,559      branches:u
              5,429      branch-misses:u           #    0.00% of all 
branches
origin/smoke-me/khw-grok
      4,704,290,235      cycles:u
     18,600,391,167      instructions:u            #    3.95  insn per cycle
      3,700,081,585      branches:u
             14,056      branch-misses:u           #    0.00% of all 
branches



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About