develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #99870] Capturing matches in regex against large strings (8MB) are slow

Thread Previous
From:
hv
Date:
September 25, 2011 01:21
Subject:
Re: [perl #99870] Capturing matches in regex against large strings (8MB) are slow
Message ID:
201109250817.p8P8Hbf25877@crypt.org
:On Sep 24, 2011, at 9:08 PM, "tchrist1 via RT" <perlbug-followup@perl.org> wrote:
:
:> I bet something is preallocating a capture SV
:> that's the same size as the original string.
:> 
:> --tom

Matthew Horsfall <wolfsage@gmail.com> wrote:
:That coincides with other tests that I've done - capture matching against the 8MB string causes an 8MB growth in memory the first fee times it's done. 
:
:-- Matthew Horsfall (alh)

I think the problem case is:
  my $string = 'x' x 1024;
  $string =~ /^(x)/;
  $string = 'y' x 1024;
  print $1;

We copy the target string when we have captures, so that looking at the
captures after the original target string has changed does not lead to
segfaults or subtler badness. (Internally, the regexp engine uses only
offsets within the string to mark possible matches; on a successful
match we know the final offsets are the actual captures.)

I don't remember if there was a specific reason we needed to copy the
whole string rather than just the <min offset> to <max offset> substring.

Hugo

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About