develooper Front page | perl.perl5.porters | Postings from September 2000

[PATCH@7014] \G in non-/g is well-defined now ... right?

Thread Next
From:
Daniel Chetlin
Date:
September 5, 2000 04:58
Subject:
[PATCH@7014] \G in non-/g is well-defined now ... right?
Message ID:
20000905045707.A8620@ilmd.chetlin.org
Happened to notice this in perlop. It describes C<\G> in a non-C</g> REx as
being the same as C<\A>, which is no longer true; it now anchors at pos(). I'm
assuming that this is no longer an unsupported feature; if it is, I'll submit
a different patch that documents the current behavior but doesn't make it
sound canonical.

Also, I was a little uncomfortable with my wording in spots, and I'm not sure
that the amount of detail I put in perlfaq6 is necessary, especially in the
first paragraph, but I couldn't come up with a better rewording or a better
place to put it. Suggestions solicited.

Ref:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-11/msg01195.html
for the patch that made this change.

Finally, is there somewhere to get at earlier patches than APC exposes? APC
goes back to somewhere in the mid 3k, but this patch is #2365 and I couldn't
find it anywhere save for Xray.

Thanks!

-dlc

--- pod/perlop.pod	2000/09/05 10:56:23	1.1
+++ pod/perlop.pod	2000/09/05 11:40:11	1.2
@@ -851,9 +851,11 @@
 
 You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
 zero-width assertion that matches the exact position where the previous
-C<m//g>, if any, left off.  The C<\G> assertion is not supported without
-the C</g> modifier.  (Currently, without C</g>, C<\G> behaves just like
-C<\A>, but that's accidental and may change in the future.)
+C<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
+still anchors at pos(), but the match is of course only attempted once.
+Using C<\G> without C</g> on a target string that has not previously had a
+C</g> match applied to it is the same as using the C<\A> assertion to match
+the beginning of the string.
 
 Examples:
 
@@ -861,7 +863,7 @@
     ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
 
     # scalar context
-    $/ = ""; $* = 1;  # $* deprecated in modern perls
+    $/ = "";
     while (defined($paragraph = <>)) {
 	while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
 	    $sentences++;
@@ -879,6 +881,7 @@
         print "3: '";
         print $1 while /(p)/gc; print "', pos=", pos, "\n";
     }
+    print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
 
 The last example should print:
 
@@ -888,6 +891,13 @@
     1: '', pos=7
     2: 'q', pos=8
     3: '', pos=8
+    Final: 'q', pos=8
+
+Notice that the final match matched C<q> instead of C<p>, which a match
+without the C<\G> anchor would have done. Also note that the final match
+did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
+final match did indeed match C<p>, it's a good bet that you're running an
+older (pre-5.6.0) Perl.
 
 A useful idiom for C<lex>-like scanners is C</\G.../gc>.  You can
 combine several regexps like this to process a string part-by-part,
--- pod/perlfaq6.pod	2000/09/05 11:25:51	1.1
+++ pod/perlfaq6.pod	2000/09/05 11:39:46	1.2
@@ -527,11 +527,16 @@
 
 =head2 What good is C<\G> in a regular expression?
 
-The notation C<\G> is used in a match or substitution in conjunction the
-C</g> modifier (and ignored if there's no C</g>) to anchor the regular
-expression to the point just past where the last match occurred, i.e. the
-pos() point.  A failed match resets the position of C<\G> unless the
-C</c> modifier is in effect.
+The notation C<\G> is used in a match or substitution in conjunction with
+the C</g> modifier to anchor the regular expression to the point just past
+where the last match occurred, i.e. the pos() point.  A failed match resets
+the position of C<\G> unless the C</c> modifier is in effect. C<\G> can be
+used in a match without the C</g> modifier; it acts the same (i.e. still
+anchors at the pos() point) but of course only matches once and does not
+update pos(), as non-C</g> expressions never do. C<\G> in an expression
+applied to a target string that has never been matched against a C</g>
+expression before or has had its pos() reset is functionally equivalent to
+C<\A>, which matches at the beginning of the string.
 
 For example, suppose you had a line of text quoted in standard mail
 and Usenet notation, (that is, with leading C<< > >> characters), and

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About