Front page | perl.perl5.porters |
Postings from August 2016
[perl #124256] Regex loop for \K in lookbehind
From:
Karl Williamson via RT
Date:
August 6, 2016 23:44
Subject:
[perl #124256] Regex loop for \K in lookbehind
Message ID:
rt-4.0.24-14742-1470527076-769.124256-15-0@perl.org
On Thu Dec 10 03:37:21 2015, demerphq wrote:
> While I have not tested it the patch looks fine to me. I have to admit
> to feeling stupid reading it actually. It didn't occur to me to simply
> forbid \K inside of lookaround, and it is a simple solution which
> leaves the option of allowing it in the future if we can figure out
> what it should mean and how it should work. So ++ to you Tony.
>
> Yves
>
> On 28 May 2015 at 08:05, Tony Cook via RT <perlbug-followup@perl.org>
> wrote:
> > On Tue May 26 22:03:58 2015, tonyc wrote:
> >> On Tue Apr 28 14:27:34 2015, demerphq wrote:
> >> > On 28 April 2015 at 22:28, Karl Williamson
> >> > <public@khwilliamson.com>
> >> > wrote:
> >> > > I haven't ever looked at the \K code; I was hoping someone else
> >> > > would look
> >> > > at this.
> >> >
> >> > Ok, then I will try to find time to address this.
> >>
> >> What's the intended behaviour of \K in a look-(ahead|behind)?
> >>
> >> My first thught was that it should be ignored, something like:
> >
> > Which was broken in at least one way.
> >
> > Attached is a "better" patch, depending on how much I've managed
> > to mess up the regexp engine ;)
> >
> > Tony
> >
> > ---
> > via perlbug: queue: perl5 status: open
> > https://rt.perl.org/Ticket/Display.html?id=124256
> >
> > From 40f4dee6393db72c33d89117b6e092e4e349366c Mon Sep 17 00:00:00
> > 2001
> > From: Tony Cook <tony@develop-help.com>
> > Date: Thu, 28 May 2015 16:03:50 +1000
> > Subject: [PATCH] prevent \K working in lookahead/behind assertions
> > (and warn)
> >
> > this is probably incorrect, since I'm clueless about the regexp
> > engine
> >
> > It may be that in_lookbehind and in_lookahead can be combined, since
> > in_lookbehind appears to be only used to maintain its own value.
> > ---
> > pod/perldiag.pod | 5 +++++
> > regcomp.c | 51
> > +++++++++++++++++++++++++++++++++++++-----------
> > t/lib/warnings/regcomp | 12 ++++++++++++
> > t/re/pat_advanced.t | 14 +++++++++++++
> > 4 files changed, 71 insertions(+), 11 deletions(-)
> >
> > diff --git a/pod/perldiag.pod b/pod/perldiag.pod
> > index 93ae13b..fb00745 100644
> > --- a/pod/perldiag.pod
> > +++ b/pod/perldiag.pod
> > @@ -6556,6 +6556,11 @@ about the /d modifier.
> > (W misc) You have a \E in a double-quotish string without a C<\U>,
> > C<\L> or C<\Q> preceding it.
> >
> > +=item Useless use of \K in lookbehind/lookahead in regex; marked by
> > S<<-- HERE> in m/%s/
> > +
> > +(W regexp) Your regular expression used C<\K> in a lookhead or
> > +lookbehind assertion, where is has no effect.
> > +
> > =item Useless use of greediness modifier '%c' in regex; marked by
> > S<<-- HERE> in m/%s/
> >
> > (W regexp) You specified something like these:
> > diff --git a/regcomp.c b/regcomp.c
> > index 712c8ed7..945778d 100644
> > --- a/regcomp.c
> > +++ b/regcomp.c
> > @@ -177,6 +177,7 @@ struct RExC_state_t {
> > through */
> > U32 study_chunk_recursed_bytes; /* bytes in bitmap */
> > I32 in_lookbehind;
> > + I32 in_lookahead;
> > I32 contains_locale;
> > I32 contains_i;
> > I32 override_recoding;
> > @@ -255,6 +256,7 @@ struct RExC_state_t {
> > #define RExC_study_chunk_recursed_bytes \
> > (pRExC_state-
> > >study_chunk_recursed_bytes)
> > #define RExC_in_lookbehind (pRExC_state->in_lookbehind)
> > +#define RExC_in_lookahead (pRExC_state->in_lookahead)
> > #define RExC_contains_locale (pRExC_state->contains_locale)
> > #define RExC_contains_i (pRExC_state->contains_i)
> > #define RExC_override_recoding (pRExC_state->override_recoding)
> > @@ -6633,6 +6635,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp,
> > int pat_count,
> > RExC_seen = 0;
> > RExC_maxlen = 0;
> > RExC_in_lookbehind = 0;
> > + RExC_in_lookahead = 0;
> > RExC_seen_zerolen = *exp == '^' ? -1 : 0;
> > RExC_extralen = 0;
> > RExC_override_recoding = 0;
> > @@ -9782,6 +9785,12 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32
> > paren, I32 *flagp,U32 depth)
> >
> > *flagp = 0; /* Tentatively. */
> >
> > + if (RExC_in_lookbehind) {
> > + RExC_in_lookbehind++;
> > + }
> > + if (RExC_in_lookahead) {
> > + RExC_in_lookahead++;
> > + }
> >
> > /* Make an OPEN node, if parenthesized. */
> > if (paren) {
> > @@ -10055,9 +10064,11 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32
> > paren, I32 *flagp,U32 depth)
> > RExC_seen |= REG_LOOKBEHIND_SEEN;
> > RExC_in_lookbehind++;
> > RExC_parse++;
> > - /* FALLTHROUGH */
> > + RExC_seen_zerolen++;
> > + break;
> > case '=': /* (?=...) */
> > - RExC_seen_zerolen++;
> > + RExC_in_lookahead++;
> > + RExC_seen_zerolen++;
> > break;
> > case '!': /* (?!...) */
> > RExC_seen_zerolen++;
> > @@ -10685,6 +10696,9 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32
> > paren, I32 *flagp,U32 depth)
> > if (RExC_in_lookbehind) {
> > RExC_in_lookbehind--;
> > }
> > + if (RExC_in_lookahead) {
> > + RExC_in_lookahead--;
> > + }
> > if (after_freeze > RExC_npar)
> > RExC_npar = after_freeze;
> > return(ret);
> > @@ -11787,15 +11801,30 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state,
> > I32 *flagp, U32 depth)
> > *flagp |= SIMPLE;
> > goto finish_meta_pat;
> > case 'K':
> > - RExC_seen_zerolen++;
> > - ret = reg_node(pRExC_state, KEEPS);
> > - *flagp |= SIMPLE;
> > - /* XXX:dmq : disabling in-place substitution seems to
> > - * be necessary here to avoid cases of memory corruption,
> > as
> > - * with: C<$_="x" x 80; s/x\K/y/> -- rgs
> > - */
> > - RExC_seen |= REG_LOOKBEHIND_SEEN;
> > - goto finish_meta_pat;
> > + if (!RExC_in_lookbehind && !RExC_in_lookahead) {
> > + RExC_seen_zerolen++;
> > + ret = reg_node(pRExC_state, KEEPS);
> > + *flagp |= SIMPLE;
> > + /* XXX:dmq : disabling in-place substitution seems
> > to
> > + * be necessary here to avoid cases of memory
> > corruption, as
> > + * with: C<$_="x" x 80; s/x\K/y/> -- rgs
> > + */
> > + RExC_seen |= REG_LOOKBEHIND_SEEN;
> > + }
> > + else {
> > + if (PASS2) {
> > + /* adjust offset so <-- points at the K */
> > + ++RExC_parse;
> > + ckWARNreg(RExC_parse, "Useless use of \\K in
> > lookbehind/lookahead");
> > + --RExC_parse;
> > + }
> > + /* originally I did goto tryagain here, but that
> > failed
> > + * with an Internal urp when a ) immediately
> > followed the \K.
> > + * So return something, even if it's NOTHING.
> > + */
> > + ret = reg_node(pRExC_state, NOTHING);
> > + }
> > + goto finish_meta_pat;
> > case 'Z':
> > ret = reg_node(pRExC_state, SEOL);
> > *flagp |= SIMPLE;
> > diff --git a/t/lib/warnings/regcomp b/t/lib/warnings/regcomp
> > index b9943a0..9d569cd 100644
> > --- a/t/lib/warnings/regcomp
> > +++ b/t/lib/warnings/regcomp
> > @@ -36,3 +36,15 @@ $a = qr/[\c,]/;
> > EXPECT
> > "\c," is more clearly written simply as "l" at - line 9.
> > "\c," is more clearly written simply as "l" at - line 10.
> > +########
> > +# regcomp.c - \K in assertion
> > +use warnings;
> > +$x = "aaaa";
> > +$x =~ /(?<=\Ka)/;
> > +$x =~ /(?=a\Ka)aa/;
> > +no warnings 'regexp';
> > +$x =~ /(?<=\Ka)/;
> > +$x =~ /(?=a\Ka)aa/;
> > +EXPECT
> > +Useless use of \K in lookbehind/lookahead in regex; marked by <--
> > HERE in m/(?<=\K <-- HERE a)/ at - line 4.
> > +Useless use of \K in lookbehind/lookahead in regex; marked by <--
> > HERE in m/(?=a\K <-- HERE a)aa/ at - line 5.
> > diff --git a/t/re/pat_advanced.t b/t/re/pat_advanced.t
> > index 891bb66..d8d5823 100644
> > --- a/t/re/pat_advanced.t
> > +++ b/t/re/pat_advanced.t
> > @@ -1498,6 +1498,20 @@ sub run_tests {
> > $x = "abcde";
> > $x =~ s/(.)\K/$1/g;
> > is($x, "aabbccddee", $message);
> > +
> > + no warnings 'regexp';
> > + $x = "aaaa";
> > + $x =~ /(?<=\Ka)/;
> > + is($&, "", "\\K in lookbehind meaningless");
> > +
> > + $x =~ /(?<=(a)\K)/;
> > + is($&, "", "\\K in lookbehind meaningless (with nesting)");
> > +
> > + $x =~ /(?=a\Ka)aa/;
> > + is($&, "aa", "\\K in lookahead meaningless");
> > +
> > + $x =~ /(?=(a)\Ka)aa/;
> > + is($&, "aa", "\\K in lookahead meaningless (nesting)");
> > }
> >
> > {
> > --
> > 1.7.10.4
> >
> >
Tony, It looks to me like you can apply this patch, with a possible watchdog-timer test as well
--
Karl Williamson
---
via perlbug: queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=124256
-
[perl #124256] Regex loop for \K in lookbehind
by Karl Williamson via RT