Front page | perl.perl5.porters |
Postings from June 2001
[ID 20010626.005] regex (?<name>...) capture-to-var paren, new$^N magic variable
From:
Jeffrey Friedl
Date:
June 26, 2001 13:18
Subject:
[ID 20010626.005] regex (?<name>...) capture-to-var paren, new$^N magic variable
Message ID:
200106262017.NAA09712@ventrue.corp.yahoo.com
This is a bug report for perl from jfriedl@yahoo.com,
generated with the help of perlbug 1.33 running under perl v5.7.1.
-----------------------------------------------------------------
[Please enter your report here]
SUMMARY:
1) Gee, it'd be nice to support (?<name>...) "named capture" parens
within regular expressions.
2) I created a new magic variable $^N, similar to $+.
3) If this is of interest, is $^N a good name?
Feeling envy for highly advanced :-) languages like Java and Visual Basic,
whose regex languages allows named captures a'la
(?<myval>\d+)
(sets $myval to the digits captured), I thought I'd use regex overloading
to convert this syntaxt to
(\d+) (?{ $varname = $+ })
(capture to normal parens, then use $+ to access it and assign to the variable)
This works fine except for a few details, such as being able to have
capturing parens nested within, since $+ refers to the *hightest-numbered*
set of parens used so far, not the most-recently *closed* set of parens
used so far.
So, I created a new magic variable, $^N, that pretty much parallels $+
except it does indeed refer to the most-recently *closed* set of parens.
Now, converting to
(\d+) (?{ $varname = $^N })
works even with nesting. This is very nice.
I can send the patches if wanted. But if wanted, what is a good variable name?
I picked $^N simply because I saw it was free.
For those interested, I've appended my package to allow this, and a short
test program.
There are still "issues" with my overloading -- the variables named are not
checked at runtime ('use strict' doesn't save you from referring to a
nonexistant variable), and the variables are not "protected", so even if
during a match a variable is set, it won't be unset if the match later
fails.
So, it really would be nice if named captures were officially supported. I
spent some hours digging through regcomp and regexec, and succeeded only in
killing massive amounts of neurons )-:, so I don't think I'll be able to
add it.
Jeffrey
Here is a short test script:
---snip------------------------------------------------------------------
use strict;
use warnings;
use Regex::SupportNamedCapture;
my $areacode; ## both lexical
our $exchange; ## and global
my $number; ## variables work fine.
"My number is 408-555-1212." =~ m{
\b
(?<areacode>\d\d\d)
-
(?<exchange>\d\d\d)
-
(?<number>\d\d\d\d \b)
}x;
print "phone number is: ($areacode) $exchange-$number\n";
my $fullnumber;
"My number is 408-555-1212." =~ m{
\b
(?<fullnumber>
(?<areacode>\d\d\d)
-
(?<exchange>\d\d\d)
-
(?<number>\d\d\d\d \b)
)
}x;
if ($fullnumber ne "$areacode-$exchange-$number")
{
print "Bummer, you don't have \$^N support: fullnumber is [$fullnumber]\n";
}
---snip------------------------------------------------------------------
Here is the package:
---snip------------------------------------------------------------------
package Regex::SupportNamedCapture;
##
## This package allows regular expressions to have named captures, a'la
##
## (?<varname>...)
## sets $varname to the result of what's matched by the /.../
##
## If your Perl supports the $^N "most-recently-closed-paren text" variable,
## the /.../ part may itself contain capturing parens. Otherwise, it
## shouldn't.
##
## jfriedl@yahoo.com
## 6/2001
##
use strict;
use warnings;
use re 'eval';
use overload;
sub import { overload::constant 'qr' => \&convert }
##
## Test to see if my proposed $^N is supported.
## Set $GutsResult to $^N if so, set to $+ if not.
##
my $GutsResult = do {
no warnings;
"1" =~ m/(1)/;
if ($^N) {
'$^N';
} else {
'$+';
}
};
our $OpenParens; ## needed for matching nested parens
my $NestedParenGuts = qr{
(?{ local $OpenParens = 0 }) ## counts the number of nested opens waiting to close
(?>
(?>
## stuff not parens, not escaped
[^()\\]+
## escaped stuff
| (?s: \\. )
# another opening paren
| \( (?{ $OpenParens++ })
# a closing paren, if we're expecting any
| (?(?{ $OpenParens }) (?{ $OpenParens-- }) \))
)*
)
}x;
##
## Mimics named capturing parens by converting something like
## (?<varname>...)
## to
## (?: (...) (?{ $varname = $^N }) )
##
##
sub convert
{
my $re = shift; ## regex to mangle
$re =~ s{
(?<! \\ ) # an unescaped...
\(\? # "(?"
< # '<'
(\w+) # $1 - an identifier
> # '>'
($NestedParenGuts) # $2 - regex guts
\) # ')'
}{
my $id = '$' . $1;
my $guts = convert($2);
"(?:($guts)(?{ $id=$GutsResult }))";
}exg;
return $re; ## return mangled regex
}
1;
---snip------------------------------------------------------------------
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=wishlist
---
Site configuration information for perl v5.7.1:
Configured by jfriedl at Sun Jun 17 23:29:39 PDT 2001.
Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
Platform:
osname=linux, osvers=2.4.5, archname=i686-linux
uname='linux fummy.telocity.com 2.4.5 #3 smp mon jun 4 22:43:14 pdt 2001 i686 unknown '
config_args='-e -s -O -D optimize=-O2 -g'
hint=previous, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Compiler:
cc='cc', ccflags ='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil
libc=/lib/libc-2.2.3.so, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
DEVEL10654
---
@INC for perl v5.7.1:
/home/jfriedl/lib/perl
/home/jfriedl/lib/perl/yahoo
/usr/local/lib/perl5/5.7.1/i686-linux
/usr/local/lib/perl5/5.7.1
/usr/local/lib/perl5/site_perl/5.7.1/i686-linux
/usr/local/lib/perl5/site_perl/5.7.1
/usr/local/lib/perl5/site_perl/5.6.1/i686-linux
/usr/local/lib/perl5/site_perl/5.6.1
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl
.
---
Environment for perl v5.7.1:
HOME=/home/jfriedl
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH=/usr/local/pgsql/lib:/home/jfriedl/src/rvplayer5.0
LOGDIR (unset)
PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0:/usr/local/prod/bin:/usr/local/java/bin
PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo
PERL_BADLANG (unset)
SHELL=/bin/tcsh
-
[ID 20010626.005] regex (?<name>...) capture-to-var paren, new$^N magic variable
by Jeffrey Friedl