develooper Front page | perl.unicode | Postings from July 2016

Encode UTF-8 optimizations

Thread Next
From:
pali
Date:
July 9, 2016 23:12
Subject:
Encode UTF-8 optimizations
Message ID:
201607100112.45201@pali
Hi! As we know utf8::encode() does not provide correct UTF-8 encoding
and Encode::encode("UTF-8", ...) should be used instead. Also opening
file should be done by :encoding(UTF-8) layer instead :utf8.

But UTF-8 strict implementation in Encode module is horrible slow when
comparing to utf8::encode(). It is implemented in Encode.xs file and for
benchmarking can be this XS implementation called directly by:

  use Encode;
  my $output = Encode::utf8::encode_xs({strict_utf8 => 1}, $input)

(without overhead of Encode module...)

Here are my results on 160 bytes long input string:

  Encode::utf8::encode_xs({strict_utf8 => 1}, ...):  8 wallclock secs ( 8.56 usr +  0.00 sys =  8.56 CPU) @ 467289.72/s (n=4000000)
  Encode::utf8::encode_xs({strict_utf8 => 0}, ...):  1 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 2409638.55/s (n=4000000)
  utf8::encode:  1 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU) @ 10256410.26/s (n=4000000)

I found two bottle necks (slow sv_catpv* and utf8n_to_uvuni functions)
and did some optimizations. Final results are:

  Encode::utf8::encode_xs({strict_utf8 => 1}, ...):  2 wallclock secs ( 3.27 usr +  0.00 sys =  3.27 CPU) @ 1223241.59/s (n=4000000)
  Encode::utf8::encode_xs({strict_utf8 => 0}, ...):  1 wallclock secs ( 1.68 usr +  0.00 sys =  1.68 CPU) @ 2380952.38/s (n=4000000)
  utf8::encode:  1 wallclock secs ( 0.40 usr +  0.00 sys =  0.40 CPU) @ 10000000.00/s (n=4000000)

Patches are on github at pull request:
https://github.com/dankogai/p5-encode/pull/56

I would like if somebody review my patches and tell if this is the
right way for optimizations...

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About