develooper Front page | perl.perl5.porters | Postings from May 2002

Fwd: Re: Text::ParseWords problem

Thread Next
From:
Hugo van der Sanden
Date:
May 30, 2002 21:10
Subject:
Fwd: Re: Text::ParseWords problem
Message ID:
200205310415.g4V4FG421696@crypt.compulink.co.uk
Chris hasn't had a chance to look at this to see whether it fixes his
original problem, but I think this is probably a good patch, and worth
putting in if it'll still fit.

Chris Nandor <pudge@pobox.com> wrote:
:In the rewrite of Text::ParseWords several years ago (1998), a more 
:complex regex was introduced that is causing problems in MacPerl on very 
:long lines of text (>10_000 characters).  In both MacPerl 5.6.1 and perl 
:on Mac OS X, a particular script is causing segmentation faults; on Mac 
:OS X, this is solved by setting the stack size to 6MB.
:
:Reverting to the simpler Text::ParseWords from 5.004 fixes the problem.
:
:I frankly have little time to look too much into this, but can provide 
:as much help as possible to someone else who might want to look into it.

Sorry to take so long to get round to this - I completely forgot
about it, mea maxima culpa.

I'm not sure whether I'm doing the right thing; if I leave the data file
unaltered (so it is a single line of 666935 chars, including various \r
characters), I get results like this:

latest perl, old ParseWords, it gives up with 'Unmatched quote' near the
end in 10.30 seconds.

latest perl, latest ParseWords, it completes with '0 fields detected'
after 44.24 seconds.

latest perl, patched ParseWords, it completes with '0 fields detected'
after 18.42 seconds.

If I convert line endings in the data file with s/\r/\n/g all give
normal looking output ending with:
454 lines, longest was 11499 characters.

Timings (same order) are 2.06s, 2.03s, 1.79s.

Not sure which of those is relevant; perhaps you could try with the
attached patch, and one of us can submit it if it helps.

Failing that, I'll have to work out how to artificially reduce my
stack size to see if I can induce a segfault.

Hugo
--- lib/Text/ParseWords.pm.old	Fri Jun 29 05:43:02 2001
+++ lib/Text/ParseWords.pm	Wed Apr 10 16:54:46 2002
@@ -56,20 +56,18 @@
 
     while (length($line)) {
 
-	($quote, $quoted, undef, $unquoted, $delim, undef) =
+	($quote, $quoted, $unquoted, $delim) =
 	    $line =~ m/^(["'])                 # a $quote
                         ((?:\\.|(?!\1)[^\\])*)    # and $quoted text
                         \1 		       # followed by the same quote
-                        ([\000-\377]*)	       # and the rest
 		       |                       # --OR--
                        ^((?:\\.|[^\\"'])*?)    # an $unquoted text
 		      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))  
                                                # plus EOL, delimiter, or quote
-                      ([\000-\377]*)	       # the rest
 		      /x;		       # extended layout
 	return() unless( $quote || length($unquoted) || length($delim));
 
-	$line = $+;
+	$line = substr $line, $+[0];
 
         if ($keep) {
 	    $quoted = "$quote$quoted$quote";


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About