develooper Front page | perl.perl5.porters | Postings from June 2001

[ID 20010626.005] regex (?<name>...) capture-to-var paren, new$^N magic variable

Jeffrey Friedl
June 26, 2001 13:18
[ID 20010626.005] regex (?<name>...) capture-to-var paren, new$^N magic variable
Message ID:

This is a bug report for perl from,
generated with the help of perlbug 1.33 running under perl v5.7.1.

[Please enter your report here]

    1) Gee, it'd be nice to support (?<name>...) "named capture" parens
       within regular expressions.

    2) I created a new magic variable $^N, similar to $+.

    3) If this is of interest, is $^N a good name?

Feeling envy for highly advanced :-) languages like Java and Visual Basic,
whose regex languages allows named captures a'la


(sets $myval to the digits captured), I thought I'd use regex overloading
to convert this syntaxt to

    (\d+)  (?{ $varname = $+ })

(capture to normal parens, then use $+ to access it and assign to the variable)

This works fine except for a few details, such as being able to have
capturing parens nested within, since $+ refers to the *hightest-numbered*
set of parens used so far, not the most-recently *closed* set of parens
used so far.

So, I created a new magic variable, $^N, that pretty much parallels $+
except it does indeed refer to the most-recently *closed* set of parens.

Now, converting to

    (\d+)  (?{ $varname = $^N })

works even with nesting.  This is very nice.

I can send the patches if wanted. But if wanted, what is a good variable name?
I picked $^N simply because I saw it was free.

For those interested, I've appended my package to allow this, and a short
test program.

There are still "issues" with my overloading -- the variables named are not
checked at runtime ('use strict' doesn't save you from referring to a
nonexistant variable), and the variables are not "protected", so even if
during a match a variable is set, it won't be unset if the match later

So, it really would be nice if named captures were officially supported. I
spent some hours digging through regcomp and regexec, and succeeded only in
killing massive amounts of neurons )-:, so I don't think I'll be able to
add it.


Here is a short test script:

    use strict;
    use warnings;
    use Regex::SupportNamedCapture;

    my  $areacode; ## both lexical
    our $exchange; ## and global
    my  $number;   ## variables work fine.

    "My number is 408-555-1212." =~ m{
	(?<number>\d\d\d\d \b)

    print "phone number is: ($areacode) $exchange-$number\n";

    my $fullnumber;

    "My number is 408-555-1212." =~ m{
	   (?<number>\d\d\d\d \b)

    if ($fullnumber ne "$areacode-$exchange-$number")
       print "Bummer, you don't have \$^N support: fullnumber is [$fullnumber]\n";

Here is the package:

package Regex::SupportNamedCapture;

## This package allows regular expressions to have named captures, a'la
##      (?<varname>...)
## sets $varname to the result of what's matched by the /.../
## If your Perl supports the $^N "most-recently-closed-paren text" variable,
## the /.../ part may itself contain capturing parens. Otherwise, it
## shouldn't.
## 6/2001
use strict;
use warnings;
use re 'eval';
use overload;
sub import { overload::constant 'qr' => \&convert }

## Test to see if my proposed $^N is supported.
## Set $GutsResult to $^N if so, set to $+ if not.
my $GutsResult = do {
    no warnings;
    "1" =~ m/(1)/;
    if ($^N) {
    } else {

our $OpenParens; ## needed for matching nested parens

my $NestedParenGuts = qr{

    (?{ local $OpenParens = 0 }) ## counts the number of nested opens waiting to close

        ## stuff not parens, not escaped

        ## escaped stuff
        | (?s: \\. )

        # another opening paren
        | \( (?{ $OpenParens++ })

        # a closing paren, if we're expecting any
        | (?(?{ $OpenParens }) (?{ $OpenParens-- }) \))

## Mimics named capturing parens by converting something like
##        (?<varname>...)
## to
##        (?:  (...)  (?{ $varname = $^N })  )
sub convert
    my $re = shift;  ## regex to mangle

    $re =~ s{
        (?<! \\ )                 # an unescaped...
        \(\?                      #  "(?"
	  <                       #     '<'
  	     (\w+)                #      $1 - an identifier
	  >                       #     '>'
          ($NestedParenGuts)      #     $2 - regex guts 
        \)                        #  ')'
       my $id   =   '$' . $1;
       my $guts = convert($2);
       "(?:($guts)(?{ $id=$GutsResult }))";

    return $re;  ## return mangled regex


[Please do not change anything below this line]
Site configuration information for perl v5.7.1:

Configured by jfriedl at Sun Jun 17 23:29:39 PDT 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
    osname=linux, osvers=2.4.5, archname=i686-linux
    uname='linux 2.4.5 #3 smp mon jun 4 22:43:14 pdt 2001 i686 unknown '
    config_args='-e -s -O -D optimize=-O2 -g'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    cc='cc', ccflags ='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.7.1:

Environment for perl v5.7.1:
    LANG (unset)
    LANGUAGE (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About