develooper Front page | perl.perl5.porters | Postings from August 2001

[PATCH] ${^RE_STR} variable

Thread Next
From:
Jeff 'japhy/Marillion' Pinyan
Date:
August 31, 2001 15:00
Subject:
[PATCH] ${^RE_STR} variable
Message ID:
Pine.GSO.4.21.0108311653380.7601-100000@crusoe.crusoe.net
I've got a candidate for a new variable.  ${^RE_STR} acts like
"$`$&$'" would, but I don't think it's as terrible.  Its usefulness is
demonstrated in my update to perlvar.pod -- you can use it to generically
extract substrings from the last string matched against, using substr(),
@-, @+, and ${^RE_STR}.

Patch is below sig.  I've tested it, documented it, and (duh) added it.

Let me know if it's just a crappy idea.  I find it terribly useful for
capturing parts of a regex directly to a variable, based on the offsets in
@- and @+, and the total string, stored in ${^RE_STR}.

-- 
Jeff "japhy" Pinyan      japhy@pobox.com      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **


--- gv.c.old	Fri Aug 31 15:58:27 2001
+++ gv.c	Fri Aug 31 16:26:35 2001
@@ -910,6 +910,10 @@
 	if (len > 1 && strNE(name, "\017PEN"))
 	    break;
 	goto magicalize;
+    case '\022':	/* $^RE_STR */
+	if (strNE(name, "\022E_STR"))
+	    break;
+	goto magicalize;
     case '\023':	/* $^S */
 	if (len > 1)
 	    break;
@@ -1739,6 +1743,11 @@
 	if (len == 1
 	    || (len == 4 && strEQ(name, "\027PEN")))
 	{
+	    goto yes;
+	}
+	break;
+    case '\022':   /* $^RE_STR */
+	if (len == 4 && strEQ(name, "\022E_STR")) {
 	    goto yes;
 	}
 	break;
--- mg.c.old	Fri Aug 31 15:58:17 2001
+++ mg.c	Fri Aug 31 16:34:26 2001
@@ -461,6 +461,13 @@
 		goto getparen;
 	}
 	return 0;
+    case '\022': /* ^RE_STR */  /* japhy */
+	if (strEQ(mg->mg_ptr, "\022E_STR")) {
+	    if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+		return rx->sublen;
+	    }
+	    return 0;
+	}
     case '`':
 	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
 	    if (rx->startp[0] != -1) {
@@ -694,6 +701,14 @@
 	}
 	sv_setsv(sv,&PL_sv_undef);
 	break;
+    case '\022':		/* ^RE_STR */  /* japhy */
+	if (strEQ(mg->mg_ptr, "\022E_STR")) {
+	    if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+		sv_setpvn(sv, rx->subbeg, rx->sublen);
+	    }
+	    else { sv_setsv(sv,&PL_sv_undef); }
+	    break;
+	}
     case '`':
 	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
 	    if ((s = rx->subbeg) && rx->startp[0] != -1) {
--- t/op/pat.t.old	Fri Aug 31 16:59:45 2001
+++ t/op/pat.t	Fri Aug 31 17:55:45 2001
@@ -6,7 +6,7 @@
 
 $| = 1;
 
-print "1..686\n";
+print "1..690\n";
 
 BEGIN {
     chdir 't' if -d 't';
@@ -2006,3 +2006,29 @@
     print "not " unless length($y) == 2 && $y eq $x;
     print "ok 686\n";
 }
+
+
+### japhy, 8/31/2001
+### ${^RE_STR}, which holds the string you're matching against
+
+"japhy" =~ /./;
+print "not " if ${^RE_STR} ne "japhy";
+print "ok 687\n";
+
+"JAPHY" =~ /no match/;  # on failure, keep last value
+print "not " if ${^RE_STR} ne "japhy";
+print "ok 688\n";
+
+"Tcl Perl Python Ruby" =~ m{
+  \s
+  (?{ local $s = $+[0] })
+  \w+
+  (?{ local $e = $+[0] })
+  (?{ $proglang = substr ${^RE_STR}, $s, $e - $s })
+}x;
+print "not " if $proglang ne "Perl";
+print "ok 689\n";
+print "not " if ${^RE_STR} ne "Tcl Perl Python Ruby";
+print "ok 690\n";
+
+
--- pod/perlretut.pod.old	Fri Aug 31 16:51:16 2001
+++ pod/perlretut.pod	Fri Aug 31 17:55:01 2001
@@ -781,11 +781,17 @@
 lesser extent, because if they are used in one regexp in a program,
 they are generated for <all> regexps in the program.  So if raw
 performance is a goal of your application, they should be avoided.
-If you need them, use C<@-> and C<@+> instead:
+If you need them, use C<@-> and C<@+> instead.  Because the variable you
+are matching against is not always the same variable, and sometimes you are
+matching against a constant string not stored in a variable, the C<${^RE_STR}>
+variable is available -- this holds the value of the string you are currently
+matching against (or of the last successful pattern match).  Using the C<@->
+and C<@+> arrays in conjunction with C<${^RE_STR}> allows you to rewrite the
+C<$`>, C<$&>, and C<$'> variables as:
 
-    $` is the same as substr( $x, 0, $-[0] )
-    $& is the same as substr( $x, $-[0], $+[0]-$-[0] )
-    $' is the same as substr( $x, $+[0] )
+    $` is the same as substr( ${^RE_STR}, 0, $-[0] )
+    $& is the same as substr( ${^RE_STR}, $-[0], $+[0] - $-[0] )
+    $' is the same as substr( ${^RE_STR}, $+[0] )
 
 =head2 Matching repetitions
 
--- pod/perltoc.pod.old	Fri Aug 31 16:46:09 2001
+++ pod/perltoc.pod	Fri Aug 31 16:45:37 2001
@@ -907,7 +907,7 @@
 $COMPILING, $^C, $DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H,
 $INPLACE_EDIT, $^I, $^M, $OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04,
 0x08, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200, $LAST_REGEXP_CODE_RESULT, $^R,
-$EXCEPTIONS_BEING_CAUGHT, $^S, $BASETIME, $^T, $PERL_VERSION, $^V,
+${^RE_STR}, $EXCEPTIONS_BEING_CAUGHT, $^S, $BASETIME, $^T, $PERL_VERSION, $^V,
 $WARNING, $^W, ${^WARNING_BITS}, ${^WIDE_SYSTEM_CALLS}, $EXECUTABLE_NAME,
 $^X, $ARGV, @ARGV, @F, @INC, @_, %INC, %ENV, $ENV{expr}, %SIG, $SIG{expr}
 
--- pod/perlvar.pod.old	Fri Aug 31 16:51:52 2001
+++ pod/perlvar.pod	Fri Aug 31 17:54:29 2001
@@ -488,21 +488,22 @@
 of the I<n>th submatch, so C<$-[1]> is the offset where $1
 begins, C<$-[2]> the offset where $2 begins, and so on.
 
-After a match against some variable $var:
+After a successful match against any string, you can use the C<${^RE_STR}>
+variable to extract the following:
 
 =over 5
 
-=item C<$`> is the same as C<substr($var, 0, $-[0])>
+=item C<$`> is the same as C<substr(${^RE_STR}, 0, $-[0])>
 
-=item C<$&> is the same as C<substr($var, $-[0], $+[0] - $-[0])>
+=item C<$&> is the same as C<substr(${^RE_STR}, $-[0], $+[0] - $-[0])>
 
-=item C<$'> is the same as C<substr($var, $+[0])>
+=item C<$'> is the same as C<substr(${^RE_STR}, $+[0])>
 
-=item C<$1> is the same as C<substr($var, $-[1], $+[1] - $-[1])>  
+=item C<$1> is the same as C<substr(${^RE_STR}, $-[1], $+[1] - $-[1])>  
 
-=item C<$2> is the same as C<substr($var, $-[2], $+[2] - $-[2])>
+=item C<$2> is the same as C<substr(${^RE_STR}, $-[2], $+[2] - $-[2])>
 
-=item C<$3> is the same as C<substr $var, $-[3], $+[3] - $-[3])>
+=item C<$3> is the same as C<substr(${^RE_STR}, $-[3], $+[3] - $-[3])>
 
 =back
 
@@ -957,6 +958,12 @@
 
 The result of evaluation of the last successful C<(?{ code })>
 regular expression assertion (see L<perlre>).  May be written to.
+
+=item ${^RE_STR}
+
+Inside a regular expression, it holds the string being matched against.
+Outside a regular expression, it holds the value of the string matched
+against from the last successful pattern match.
 
 =item $EXCEPTIONS_BEING_CAUGHT
 


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About