develooper Front page | perl.perl5.porters | Postings from July 2013

Re: [perl #118593] [PATCH 1/4] Fix docs about English.pm

Thread Previous
From:
Dave Mitchell
Date:
July 24, 2013 14:30
Subject:
Re: [perl #118593] [PATCH 1/4] Fix docs about English.pm
Message ID:
20130724143040.GI2177@iabyn.com
On Mon, Jun 24, 2013 at 07:05:14AM -0700, Wallace Reis via RT wrote:
> On Mon Jun 24 06:55:15 2013, davem wrote:
> > Actually that's not strictly true.
> > The full fix for $`,$&,$' performance issues involves the new COW
> > mechanism, which was disabled by default in 5.18 (and enabled by default
> > in 5.19.1).
> > [snip]
> > Even if had been fixed in 5.18.0, I think that the text in perlvars
> > warning that in older versions the puncuation vars are slow, should be
> > kept.
> 
> Right. The English.pm documentation has already a warning about it for
> older versions, would just require an update for mentioning 5.19 or 5.20
> instead of 5.18. That is simpler than having such warning all around in
> other docs which makes use of English.pm

I've just pushed the following 2 commits, which hopefully resolves any
issues I had with these changes.


commit 142a37fdb385bb222232b286abdedf9b1daaa746
Author:     David Mitchell <davem@iabyn.com>
AuthorDate: Wed Jul 24 14:18:22 2013 +0100
Commit:     David Mitchell <davem@iabyn.com>
CommitDate: Wed Jul 24 14:42:43 2013 +0100

    English.pm: update perl version where perf fixed
    
    It still said that the performance of $`, $&, $' was fixed in 5.18.
    Update that to 5.20, since COW wasn't enabled by default in 5.18.


Affected files ...
    
    M	lib/English.pm

Differences ...

diff --git a/lib/English.pm b/lib/English.pm
index e4ee10a..6560f5f 100644
--- a/lib/English.pm
+++ b/lib/English.pm
@@ -1,6 +1,6 @@
 package English;
 
-our $VERSION = '1.07';
+our $VERSION = '1.08';
 
 require Exporter;
 @ISA = qw(Exporter);
@@ -34,9 +34,9 @@ See L<perlvar> for a complete list of these.
 
 =head1 PERFORMANCE
 
-NOTE: This was fixed in perl 5.18.  Mentioning these three variables no
+NOTE: This was fixed in perl 5.20.  Mentioning these three variables no
 longer makes a speed difference.  This section still applies if your code
-is to run on perl 5.16 or earlier.
+is to run on perl 5.18 or earlier.
 
 This module can provoke sizeable inefficiencies for regular expressions,
 due to unfortunate implementation details.  If performance matters in





commit 4044502721ac7b89c6d21cf1099a3a518717eeba
Author:     David Mitchell <davem@iabyn.com>
AuthorDate: Wed Jul 24 15:20:22 2013 +0100
Commit:     David Mitchell <davem@iabyn.com>
CommitDate: Wed Jul 24 15:20:22 2013 +0100

    perlvar.pod: add a separate section on $& et al
    
    Add a new separate section explaining the performance issues of $`, $&
    and $'; plus descriptions of the various workarounds like @-, /p and COW,
    and which perl version they were each introduced in.
    
    Then in the entries for each individual var, strip out any commentary
    about performance, and just include a link to the new performance
    section.


Affected files ...
    
    M	pod/perlvar.pod

Differences ...

diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index a278d10..4d869f1 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -801,16 +801,51 @@ we have not made another match:
     $1 is Mutt; $2 is Jeff
     $1 is Wallace; $2 is Grommit
 
-The C<Devel::NYTProf> and C<Devel::FindAmpersand>
-modules can help you find uses of these
-problematic match variables in your code.
+=head3 Performance issues
 
-Since Perl v5.10.0, you can use the C</p> match operator flag and the
-C<${^PREMATCH}>, C<${^MATCH}>, and C<${^POSTMATCH}> variables instead
-so you only suffer the performance penalties.
+Traditionally in Perl, any use of any of the three variables  C<$`>, C<$&>
+or C<$'> (or their C<use English> equivalents) anywhere in the code, caused
+all subsequent successful pattern matches to make a copy of the matched
+string, in case the code might subsequently access one of those variables.
+This imposed a considerable performance penalty across the whole program,
+so generally the use of these variables has been discouraged.
 
-If you are using Perl v5.20.0 or higher, you do not need to worry about
-this, as the three naughty variables are no longer naughty.
+In Perl 5.6.0 the C<@-> and C<@+> dynamic arrays were introduced that
+supply the indices of successful matches. So you could for example do
+this:
+
+    $str =~ /pattern/;
+
+    print $`, $&, $'; # bad: perfomance hit
+
+    print             # good: no perfomance hit
+	substr($str, 0,     $-[0]),
+	substr($str, $-[0], $+[0]-$-[0]),
+	substr($str, $+[0]);
+
+In Perl 5.10.0 the C</p> match operator flag and the C<${^PREMATCH}>,
+C<${^MATCH}>, and C<${^POSTMATCH}> variables were introduced, that allowed
+you to suffer the penalties only on patterns marked with C</p>.
+
+In Perl 5.18.0 onwards, perl started noting the presence of each of the
+three variables separately, and only copied that part of the string
+required; so in
+
+    $`; $&; "abcdefgh" =~ /d/
+
+perl would only copy the "abcd" part of the string. That could make a big
+difference in something like
+
+    $str = 'x' x 1_000_000;
+    $&; # whoops
+    $str =~ /x/g # one char copied a million times, not a million chars
+
+In Perl 5.20.0 a new copy-on-write system was enabled by default, which
+finally fixes all performance issues with these three variables, and makes
+them safe to use anywhere.
+
+The C<Devel::NYTProf> and C<Devel::FindAmpersand> modules can help you
+find uses of these problematic match variables in your code.
 
 =over 8
 
@@ -834,12 +869,8 @@ The string matched by the last successful pattern match (not counting
 any matches hidden within a BLOCK or C<eval()> enclosed by the current
 BLOCK).
 
-In Perl v5.18 and earlier, the use of this variable
-anywhere in a program imposes a considerable
-performance penalty on all regular expression matches.  To avoid this
-penalty, you can extract the same substring by using L</@->.  Starting
-with Perl v5.10.0, you can use the C</p> match flag and the C<${^MATCH}>
-variable to do the same thing for particular match operations.
+See L</Performance issues> above for the serious performance implications
+of using this variable (even once) in your code.
 
 This variable is read-only and dynamically-scoped.
 
@@ -850,6 +881,9 @@ X<${^MATCH}>
 
 This is similar to C<$&> (C<$MATCH>) except that it does not incur the
 performance penalty associated with that variable.
+
+See L</Performance issues> above.
+
 In Perl v5.18 and earlier, it is only guaranteed
 to return a defined value when the pattern was compiled or executed with
 the C</p> modifier.  In Perl v5.20, the C</p> modifier does nothing, so
@@ -868,13 +902,8 @@ The string preceding whatever was matched by the last successful
 pattern match, not counting any matches hidden within a BLOCK or C<eval>
 enclosed by the current BLOCK.
 
-In Perl v5.18 and earlier, the use of this variable
-anywhere in a program imposes a considerable
-performance penalty on all regular expression matches.  To avoid this
-penalty, you can extract the same substring by using L</@->.  Starting
-with Perl v5.10.0, you can use the C</p> match flag and the
-C<${^PREMATCH}> variable to do the same thing for particular match
-operations.
+See L</Performance issues> above for the serious performance implications
+of using this variable (even once) in your code.
 
 This variable is read-only and dynamically-scoped.
 
@@ -885,6 +914,9 @@ X<$`> X<${^PREMATCH}>
 
 This is similar to C<$`> ($PREMATCH) except that it does not incur the
 performance penalty associated with that variable.
+
+See L</Performance issues> above.
+
 In Perl v5.18 and earlier, it is only guaranteed
 to return a defined value when the pattern was compiled or executed with
 the C</p> modifier.  In Perl v5.20, the C</p> modifier does nothing, so
@@ -907,13 +939,8 @@ enclosed by the current BLOCK).  Example:
     /def/;
     print "$`:$&:$'\n";  	# prints abc:def:ghi
 
-In Perl v5.18 and earlier, the use of this variable
-anywhere in a program imposes a considerable
-performance penalty on all regular expression matches.
-To avoid this penalty, you can extract the same substring by
-using L</@->.  Starting with Perl v5.10.0, you can use the C</p> match flag
-and the C<${^POSTMATCH}> variable to do the same thing for particular
-match operations.
+See L</Performance issues> above for the serious performance implications
+of using this variable (even once) in your code.
 
 This variable is read-only and dynamically-scoped.
 
@@ -924,6 +951,9 @@ X<${^POSTMATCH}> X<$'> X<$POSTMATCH>
 
 This is similar to C<$'> (C<$POSTMATCH>) except that it does not incur the
 performance penalty associated with that variable.
+
+See L</Performance issues> above.
+
 In Perl v5.18 and earlier, it is only guaranteed
 to return a defined value when the pattern was compiled or executed with
 the C</p> modifier.  In Perl v5.20, the C</p> modifier does nothing, so


-- 
Art is anything that has a label (especially if the label is "untitled 1")

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About