develooper Front page | perl.perl5.porters | Postings from February 2016

[perl #127617] /n regexp modifier and backreferences to previousgroups

Thread Next
Ed Avis
February 26, 2016 12:14
[perl #127617] /n regexp modifier and backreferences to previousgroups
Message ID:
# New Ticket Created by  "Ed Avis" 
# Please include the string:  [perl #127617]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.40 running under perl 5.22.1.

[Please describe your issue here]

The /n regexp modifier, according to perlvar, 'will stop $1, $2,
etc... from being filled in'.  However it has another behaviour which
is not documented, and in my opinion, is not helpful.  It also stops
the group from being referenced by (?-1) and similar within the same

So for example, with the current behaviour:

% perl -E '$_ = "aa"; /(a)(?-1)/ or die; say $1 // "undef"'
% perl -E '$_ = "aa"; /(a)(?-1)/n or die; say $1 // "undef"'
Reference to nonexistent group in regex...

This applies too if the modifier is set within a part of the regexp:

% perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"'
Reference to nonexistent group in regex...

I would prefer it to still allow referring to the group within the
regexp itself, even if the external effect of setting $1, etc does not
happen.  So my preferred behaviour would be

% perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"'

Although this would be a change to the current semantics, it is more
closely in line with what perlvar currently documents, so might be
considered more of a bug fix than an incompatible change.

Now I will give a bit of background about why I this would be useful.
Suppose I have a regular expression matching a simple regular
language.  Strings in the language are sequences of one or more 'a'.

    $lang_re = qr/a+/;

I may define this regexp in a library and then use it in client code
which matches a string in the language followed by a digit:

    /\A ($lang_re) ([0-9]) \z/x or die;
    my ($lang_str, $digit) = ($1, $2);

Now suppose I change the definition of the language so that valid
strings are now either a sequence of 'a' as before, or <X> where
X is a valid string.

    $lang_re = qr/ ( a+ | < (?-1) > ) /x;

(For this trivially simple language there may be other ways to do it
but in general a recursively defined language requires recursive
subpatterns in the regexp.)
The modified $lang_re works but now it has a side effect of setting a
capturing group.  The existing client code that expected to include
$lang_re in a larger regexp and then get ($1, $2) will be broken by
this change.

To avoid adding a new externally visible capturing group I would like
to use the /n modifier:

    $lang_re = qr/ (?n: ( a+ | < (?-1) > ) ) /x;

The intention is that while $lang_re may use a recursive subpattern
internally, it does not expose a new capturing group to the outside
world.  So it can be used as a building block in a larger pattern
without bumping around the $1,$2,$3 results whenever the
implementation of $lang_re changes.

Although using named captures everywhere mitigates the problem it does
not solve it, since of course there is no guarantee that the names of
capturing groups will be globally unique.  And of course if $lang_re
is provided by a regexp library, the library author cannot know that
all client code is always using named captures rather than $1,$2,$3.

I think that changing the semantics of /n, so that it stops
*capturing*, but still allows the group to be referenced with
recursive subpatterns, would make it much more useful and would more
closely match the documentation.

(There may also be room for a regexp modifier X which hides groups
from recursive subpattern matches *outside* the (?:X ... ) but allows
them to be visible *inside*.  This would be a further improvement to
building reusable, composable regexps.  The letter X is just an
example of course.  Possibly this could even be the behaviour
of (?:n ... ).  But I would not want this to distract from the more
important issue of making /n's current behaviour match the docs.)

(FWIW, the real code which prompted this is a regexp library to match a
'simple arithmetic expression', being numbers with operators like +
and - and parentheses.  Such a 'simple expression' is in some sense
safe to evaluate using eval(STRING) to get a number.)

[Please do not change anything below this line]
Site configuration information for perl 5.22.1:

Configured by Red Hat, Inc. at Mon Dec 14 11:14:02 UTC 2015.

Summary of my perl5 (revision 5 version 22 subversion 1) configuration:
    osname=linux, osvers=4.3.0-1.fc24.x86_64, archname=x86_64-linux-thread-multi
    uname='linux 4.3.0-1.fc24.x86_64 #1 smp mon nov 2 16:27:20 utc 2015 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=none -Dccflags=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Dldflags=-Wl,-z,relro  -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro  -Dlddlflags=-shared -Wl,-z,relro  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.22.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallu
 srbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='  -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='5.3.1 20151207 (Red Hat 5.3.1-2)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags ='-Wl,-z,relro  -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
    libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro '
    cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro  -L/usr/local/lib -fstack-protector-strong'

Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch15: Define SONAME for
    Fedora Patch16: Install to -Dshrpdir value
    Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015)
    Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912)
    Fedora Patch27: Make PadlistNAMES() lvalue again (CPAN RT#101063)
    Fedora Patch28: Make magic vtable writable as a work-around for Coro (CPAN RT#101063)
    Fedora Patch200: Link XS modules to with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to with EU::MM on Linux

@INC for perl 5.22.1:

Environment for perl 5.22.1:
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 40 Berkeley Square, 3rd Floor, London, W1J 5AL. It is authorised and regulated by the Financial Conduct Authority.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About