develooper Front page | perl.perl5.porters | Postings from January 2009

Re: Even faster Unicode character counting

Thread Previous | Thread Next
From:
David Nicol
Date:
January 12, 2009 08:01
Subject:
Re: Even faster Unicode character counting
Message ID:
934f64a20901120800v5fba6e80j354a0914ce65a18b@mail.gmail.com
[root@l004010 perl]#  (cd t;./perl harness uni/*.t) 2>1
...
All tests successful.
Files=18, Tests=24327, 32 wallclock secs (14.30 usr  0.49 sys + 15.16
cusr  0.55 csys = 30.50 CPU)
Result: PASS


all the bit-twiddling stuff is within #ifndef EBCDIC, and tests pass.
When EBCDIC is not defined, the attached patch does skipping based on
table lookup until reaching a word boundary, then counts continuation
bytes.  When there is a 0xFF byte in an examined word, it falls back
to the table lookup.

In the EBCDIC case, it just switches from the skip table lookup and
character counting to the hop table lookup and continuation counting,
which should result in exactly the same result, in exactly the same
time.  So why change?  Maybe an EBCDIC-compliant acceleration method
will appear, for instance the fast passing-over of bytes lacking high
bits (my previous patch.)

It would also make sense to ifndef the whole revised function out and
leave the current one in for EBCDIC, which is what Karl Williamson has
asked for, instead of simply leaving out all the bittwiddling, instead
of including fast skipping of words with all bytes >= 127, but then we
wouldn't get that little optimization.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About