develooper Front page | perl.perl5.porters | Postings from February 2015

[perl #123202] Slow global pattern match in taint mode with input from utf8

From:
Tony Cook via RT
Date:
February 4, 2015 04:55
Subject:
[perl #123202] Slow global pattern match in taint mode with input from utf8
Message ID:
rt-4.0.18-30823-1423025698-857.123202-15-0@perl.org
On Thu Nov 13 09:17:13 2014, heinz.knutzen@gmail.com wrote:
> There is a massive slowdown in global pattern match with Perl 5.20.1 in 
> taint mode.
> This is a follow up to bug #120692.
> That has been fixed, but the bug still occurs with taint mode enabled.
> 
> Create test data with this shell command line:
> $ for i in $(seq 1 20000) ; do echo -n ab; done > abab
> 
> $ perlbrew use perl-5.20.1
> $ /usr/bin/time -f '%Us' perl -Ci -e '$in = <>;while ($in =~ m/\Ga+b/g) 
> {}' abab
> 0.02s
> $ /usr/bin/time -f '%Us' perl -T -Ci -e '$in = <>;while ($in =~ 
> m/\Ga+b/g) {}' abab
> 12.14s
> $ perlbrew use perl-5.18.4
> $ /usr/bin/time -f '%Us' perl -Ci -e '$in = <>;while ($in =~ m/\Ga+b/g) 
> {}' abab
> 0.02s
> $ /usr/bin/time -f '%Us' perl -T -Ci -e '$in = <>;while ($in =~ 
> m/\Ga+b/g) {}' abab
> 0.02s
> 
> This slowdown also appears with Perl 5.21.5.
> 
> I had to revert an upgrade of a production system from 5.16.3 to 5.20.1 
> today, because of this bug.

This appears to be caused by:

commit 25fdce4a165b6305e760d4c8d94404ce055657a0
Author: Father Chrysostomos <sprout@cpan.org>
Date:   Tue Jul 23 13:15:34 2013 -0700

    Stop pos() from being confused by changing utf8ness
    
    The value of pos() is stored as a byte offset.  If it is stored on a
    tied variable or a reference (or glob), then the stringification could
    change, resulting in pos() now pointing to a different character off-
    set or pointing to the middle of a character:

Since taint magic is GMAGIC, MgBYTEPOS_set() always sets mg_len to the character offset, slow in itself since it needs to translate the byte offset to a character offset, but then needs to translate it back on the next \G regex.

This is reasonable for most types of magic, since the string may change based on the magic, but taint magic just sets a flag, so this is unnecessary.

The attached patch appears to fix the problem, though if someone has a better name for the function...

Tony

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=123202



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About