Front page | perl.perl5.porters |
Postings from August 2001
[PATCH] ${^RE_STR} variable
Thread Next
From:
Jeff 'japhy/Marillion' Pinyan
Date:
August 31, 2001 15:00
Subject:
[PATCH] ${^RE_STR} variable
Message ID:
Pine.GSO.4.21.0108311653380.7601-100000@crusoe.crusoe.net
I've got a candidate for a new variable. ${^RE_STR} acts like
"$`$&$'" would, but I don't think it's as terrible. Its usefulness is
demonstrated in my update to perlvar.pod -- you can use it to generically
extract substrings from the last string matched against, using substr(),
@-, @+, and ${^RE_STR}.
Patch is below sig. I've tested it, documented it, and (duh) added it.
Let me know if it's just a crappy idea. I find it terribly useful for
capturing parts of a regex directly to a variable, based on the offsets in
@- and @+, and the total string, stored in ${^RE_STR}.
--
Jeff "japhy" Pinyan japhy@pobox.com http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
--- gv.c.old Fri Aug 31 15:58:27 2001
+++ gv.c Fri Aug 31 16:26:35 2001
@@ -910,6 +910,10 @@
if (len > 1 && strNE(name, "\017PEN"))
break;
goto magicalize;
+ case '\022': /* $^RE_STR */
+ if (strNE(name, "\022E_STR"))
+ break;
+ goto magicalize;
case '\023': /* $^S */
if (len > 1)
break;
@@ -1739,6 +1743,11 @@
if (len == 1
|| (len == 4 && strEQ(name, "\027PEN")))
{
+ goto yes;
+ }
+ break;
+ case '\022': /* $^RE_STR */
+ if (len == 4 && strEQ(name, "\022E_STR")) {
goto yes;
}
break;
--- mg.c.old Fri Aug 31 15:58:17 2001
+++ mg.c Fri Aug 31 16:34:26 2001
@@ -461,6 +461,13 @@
goto getparen;
}
return 0;
+ case '\022': /* ^RE_STR */ /* japhy */
+ if (strEQ(mg->mg_ptr, "\022E_STR")) {
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ return rx->sublen;
+ }
+ return 0;
+ }
case '`':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
if (rx->startp[0] != -1) {
@@ -694,6 +701,14 @@
}
sv_setsv(sv,&PL_sv_undef);
break;
+ case '\022': /* ^RE_STR */ /* japhy */
+ if (strEQ(mg->mg_ptr, "\022E_STR")) {
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ sv_setpvn(sv, rx->subbeg, rx->sublen);
+ }
+ else { sv_setsv(sv,&PL_sv_undef); }
+ break;
+ }
case '`':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
if ((s = rx->subbeg) && rx->startp[0] != -1) {
--- t/op/pat.t.old Fri Aug 31 16:59:45 2001
+++ t/op/pat.t Fri Aug 31 17:55:45 2001
@@ -6,7 +6,7 @@
$| = 1;
-print "1..686\n";
+print "1..690\n";
BEGIN {
chdir 't' if -d 't';
@@ -2006,3 +2006,29 @@
print "not " unless length($y) == 2 && $y eq $x;
print "ok 686\n";
}
+
+
+### japhy, 8/31/2001
+### ${^RE_STR}, which holds the string you're matching against
+
+"japhy" =~ /./;
+print "not " if ${^RE_STR} ne "japhy";
+print "ok 687\n";
+
+"JAPHY" =~ /no match/; # on failure, keep last value
+print "not " if ${^RE_STR} ne "japhy";
+print "ok 688\n";
+
+"Tcl Perl Python Ruby" =~ m{
+ \s
+ (?{ local $s = $+[0] })
+ \w+
+ (?{ local $e = $+[0] })
+ (?{ $proglang = substr ${^RE_STR}, $s, $e - $s })
+}x;
+print "not " if $proglang ne "Perl";
+print "ok 689\n";
+print "not " if ${^RE_STR} ne "Tcl Perl Python Ruby";
+print "ok 690\n";
+
+
--- pod/perlretut.pod.old Fri Aug 31 16:51:16 2001
+++ pod/perlretut.pod Fri Aug 31 17:55:01 2001
@@ -781,11 +781,17 @@
lesser extent, because if they are used in one regexp in a program,
they are generated for <all> regexps in the program. So if raw
performance is a goal of your application, they should be avoided.
-If you need them, use C<@-> and C<@+> instead:
+If you need them, use C<@-> and C<@+> instead. Because the variable you
+are matching against is not always the same variable, and sometimes you are
+matching against a constant string not stored in a variable, the C<${^RE_STR}>
+variable is available -- this holds the value of the string you are currently
+matching against (or of the last successful pattern match). Using the C<@->
+and C<@+> arrays in conjunction with C<${^RE_STR}> allows you to rewrite the
+C<$`>, C<$&>, and C<$'> variables as:
- $` is the same as substr( $x, 0, $-[0] )
- $& is the same as substr( $x, $-[0], $+[0]-$-[0] )
- $' is the same as substr( $x, $+[0] )
+ $` is the same as substr( ${^RE_STR}, 0, $-[0] )
+ $& is the same as substr( ${^RE_STR}, $-[0], $+[0] - $-[0] )
+ $' is the same as substr( ${^RE_STR}, $+[0] )
=head2 Matching repetitions
--- pod/perltoc.pod.old Fri Aug 31 16:46:09 2001
+++ pod/perltoc.pod Fri Aug 31 16:45:37 2001
@@ -907,7 +907,7 @@
$COMPILING, $^C, $DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H,
$INPLACE_EDIT, $^I, $^M, $OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04,
0x08, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200, $LAST_REGEXP_CODE_RESULT, $^R,
-$EXCEPTIONS_BEING_CAUGHT, $^S, $BASETIME, $^T, $PERL_VERSION, $^V,
+${^RE_STR}, $EXCEPTIONS_BEING_CAUGHT, $^S, $BASETIME, $^T, $PERL_VERSION, $^V,
$WARNING, $^W, ${^WARNING_BITS}, ${^WIDE_SYSTEM_CALLS}, $EXECUTABLE_NAME,
$^X, $ARGV, @ARGV, @F, @INC, @_, %INC, %ENV, $ENV{expr}, %SIG, $SIG{expr}
--- pod/perlvar.pod.old Fri Aug 31 16:51:52 2001
+++ pod/perlvar.pod Fri Aug 31 17:54:29 2001
@@ -488,21 +488,22 @@
of the I<n>th submatch, so C<$-[1]> is the offset where $1
begins, C<$-[2]> the offset where $2 begins, and so on.
-After a match against some variable $var:
+After a successful match against any string, you can use the C<${^RE_STR}>
+variable to extract the following:
=over 5
-=item C<$`> is the same as C<substr($var, 0, $-[0])>
+=item C<$`> is the same as C<substr(${^RE_STR}, 0, $-[0])>
-=item C<$&> is the same as C<substr($var, $-[0], $+[0] - $-[0])>
+=item C<$&> is the same as C<substr(${^RE_STR}, $-[0], $+[0] - $-[0])>
-=item C<$'> is the same as C<substr($var, $+[0])>
+=item C<$'> is the same as C<substr(${^RE_STR}, $+[0])>
-=item C<$1> is the same as C<substr($var, $-[1], $+[1] - $-[1])>
+=item C<$1> is the same as C<substr(${^RE_STR}, $-[1], $+[1] - $-[1])>
-=item C<$2> is the same as C<substr($var, $-[2], $+[2] - $-[2])>
+=item C<$2> is the same as C<substr(${^RE_STR}, $-[2], $+[2] - $-[2])>
-=item C<$3> is the same as C<substr $var, $-[3], $+[3] - $-[3])>
+=item C<$3> is the same as C<substr(${^RE_STR}, $-[3], $+[3] - $-[3])>
=back
@@ -957,6 +958,12 @@
The result of evaluation of the last successful C<(?{ code })>
regular expression assertion (see L<perlre>). May be written to.
+
+=item ${^RE_STR}
+
+Inside a regular expression, it holds the string being matched against.
+Outside a regular expression, it holds the value of the string matched
+against from the last successful pattern match.
=item $EXCEPTIONS_BEING_CAUGHT
Thread Next
-
[PATCH] ${^RE_STR} variable
by Jeff 'japhy/Marillion' Pinyan