develooper Front page | perl.perl5.porters | Postings from October 2014

[perl #123071] substitution loop issue with long strings

From:
Father Chrysostomos
Date:
October 28, 2014 04:40
Subject:
[perl #123071] substitution loop issue with long strings
Message ID:
rt-4.0.18-8056-1414471219-106.123071-75-0@perl.org
# New Ticket Created by  Father Chrysostomos 
# Please include the string:  [perl #123071]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=123071 >


I’m creating a ticket for this, so it is easier to track.

In <CAH+_n-4R1tO7nxEu6ehRkNZfO+0reMdcU7wGK-YMF3SqHchoOw@mail.gmail.com> Edward Peschko wrote:
> I'm getting the following problem when doing a substitution on a large string:
> 
> Substitution loop at ...
> 
> Is there a way to override this error? As it is, its annoying because
> acts as a de-facto built in limitation on the size of strings that you
> can substitute.
> 
> Is this fixed in the latest versions of perl?

And in <CAH+_n-4-aiGqxHDOdwd9NB-xbGkGfaEMOjsGyeSSZKUEi6R-mw@mail.gmail.com> he wrote:
> Its very easy to reproduce:
> 
> local($/) = undef;
> open(FD, "very_large_file.txt");  # say with the alphabet printed over and over, one per line, 2 GB in total size
> my $line = <FD>;
> close(FD);
> 
> do a substitution where the size of substitution is greater than the
> thing its replacing, ie:
> 
> $line =~ s#a#bbb#sg;
> 
> and you'll get 'Substitution loop at ... line ...'
> 
> And no - the 'substitution loop' description as described in perldiag
> doesn't apply. Any replacement string doesn't work (where it is longer
> than the original). There are only a finite number of 'a's in the
> source string - so my guess is what is happening is perl is keeping
> some counter of substitutions, and that counter is overflowing.

That’s exactly what’s happening.  The sbu_iters and sbu_maxiters members defined in cop.h are of type I32.

(And this bug is *old*.  Perl 1 had a fixed limit of 10000.  Perl 4 started calculating the maximum number of iterations based on the string length, fixing the bug, but in such a way that when 64-bit systems came along it resurfaced.  So since Perl 4 the bug is as old as 64-bit systems.)

We could fix this by changing those two struct members to SSize_t.  But if that would enlarge the struct subst/struct blk union defined in cop.h, it might be worthwhile considering skipping the check altogether for long strings.  After all, if substitution loops, it is because of a bug in perl; and if that bug does occur then it is likely to happen regardless of the length of the string.  (Right?)  So it will be caught even if the check is skipped for long strings.

Now, to work around the bug, you would have to do a while() loop instead of substituting all at once.  But that will still fail in 5.18 and earlier, because it was not until 5.20 that the regular expression gained support for strings longer than 2GB.  Another thing you could do is split your string into smaller strings and concatenate them afterwards.  But only you can tell whether that will work for your code.


-- 

Father Chrysostomos




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About