develooper Front page | perl.perl5.porters | Postings from March 2014

[perl #114804] correction to documentation

Thread Next
From:
Karl Williamson via RT
Date:
March 24, 2014 19:00
Subject:
[perl #114804] correction to documentation
Message ID:
rt-4.0.18-29161-1395687612-690.114804-15-0@perl.org
On Sun Jan 20 09:59:02 2013, pure.mathmo@googlemail.com wrote:
> Or you could say this:
> 
> Given A and A' which are matches for S{min,max}, we can think of them as
> matches for S{n} and S{n'}, respectively, for some n, n' between min and
> max. We may assume n <= n'. Then simply compare A and A' as matches for the
> regex S{n}(?:S{n'-n})? , or S{n}(?:S{n'-n})?? in the case of lazy match.
> 
> One slight drawback of this wording is that it might not be obvious that
> it's even a transitive relation, since the value of n in the regex depends
> on which two matches you're comparing.
> 
> On 19 January 2013 13:05, David Knipe <pure.mathmo@googlemail.com> wrote:
> 
> > I don't think there is an easy way to do this. It's all very well writing
> > vague things or simple things that work in most cases, such as the
> > description of lazy vs greedy quantifiers in
> > http://perldoc.perl.org/perlre.html#Regular-Expressions . But there
> > should be somewhere where it's done properly. And it seems to me that the
> > #Combining-RE-Pieces section presents itself as the place where it's done
> > properly.
> >
> > It doesn't matter if the bullet points for quantifiers are much longer
> > than the other bullet points.
> >
> > So as far as I can tell we'll have to settle for something like what I
> > originally wrote, unless someone can find a version of yves orton's
> > description that works for lazy matches too.
> >
> > David
> >
> >
> > On 15 January 2013 03:32, James E Keenan via RT <perlbug-followup@perl.org
> > > wrote:
> >
> >> On Tue Sep 18 17:16:41 2012, pure.mathmo@googlemail.com wrote:
> >> > I see two problems with this:
> >> >   (1) For the lazy match, it still gives the same result as the
> >> > documentation file suggests.
> >> >   (2) It could be taken to mean that the interpreter tries out all
> >> > possible
> >> > combinations, which could drastically slow things down. For instance,
> >> > if S
> >> > is just a single character, it could (in some hypothetical worst-case
> >> > scenario) have to try 2**n combinations just to match S{0,n}. If n=3,
> >> > the
> >> > combinations would be: (S)(S)(S), (S)(S)(), (S)()(S), (S)()(),
> >> > ()(S)(S),
> >> > etc. Presumably, it is in fact intelligent enough not to try (S)(S)(),
> >> > (S)()(S) and ()(S)(S) separately, in which case the number of things
> >> > to try
> >> > is only linear.
> >> >
> >> > On 18 September 2012 13:03, yves orton via RT <perlbug-
> >> > followup@perl.org>wrote:
> >> >
> >> > >
> >> > > Cant we define it in terms of the ? quantifier? Something like this:
> >> > >
> >> > > S{0,1}= S?
> >> > > S{1,2}= SS?
> >> > > S{1,3}= SS?S?
> >> > > S{2,4}= SSS?S?
> >> > >
> >> > > Yves
> >> > >
> >>
> >> David:  Can you state how/where the documentation ought to be changed?
> >> (In effect, provide a patch.)  That may help move the discussion of this
> >> issue forward.
> >>
> >> Thank you very much.
> >> Jim Keenan
> >>
> >
> >

I investigated this and believe the documentation is already correct, but could stand some additional text to clarify (proposed patch attached).  I ran the code that's in the patch with -Dr, and got the following output.
Compiling REx " (abc|a|b) {2} | (abc|a|b) {1} "
Final program:
   1: BRANCH (19)
   2:   CURLYX[0] {2,2} (18)
   4:     OPEN1 (6)
   6:       TRIE-EXACT[ab] (15)
            <abc> 
            <a> 
            <b> 
  15:     CLOSE1 (17)
  17:   WHILEM (0)
  18:   NOTHING (37)
  19: BRANCH (FAIL)
  20:   CURLYX[1] {1,1} (36)
  22:     OPEN2 (24)
  24:       TRIE-EXACT[ab] (33)
            <abc> 
            <a> 
            <b> 
  33:     CLOSE2 (35)
  35:   WHILEM (0)
  36:   NOTHING (37)
  37: END (0)
minlen 1 
Enabling $` $& $' support (0x7).

EXECUTING...

Matching REx " (abc|a|b) {2} | (abc|a|b) {1} " against "abc"
   0 <> <abc>                |  1:BRANCH(19)
   0 <> <abc>                |  2:  CURLYX[0] {2,2}(18)
   0 <> <abc>                | 17:    WHILEM(0)
                                      whilem: matched 0 out of 2..2
   0 <> <abc>                |  4:      OPEN1(6)
   0 <> <abc>                |  6:      TRIE-EXACT[ab](15)
   0 <> <abc>                |          State:    1 Accepted: N Charid:  1 CP:  61 After State:    2
   1 <a> <bc>                |          State:    2 Accepted: Y Charid:  2 CP:  62 After State:    3
   2 <ab> <c>                |          State:    3 Accepted: Y Charid:  3 CP:  63 After State:    4
   3 <abc> <>                |          State:    4 Accepted: Y Charid:  0 CP:   0 After State:    0
                                        got 2 possible matches
                                        TRIE matched word #1, continuing
   3 <abc> <>                | 15:        CLOSE1(17)
   3 <abc> <>                | 17:        WHILEM(0)
                                          whilem: matched 1 out of 2..2
   3 <abc> <>                |  4:          OPEN1(6)
   3 <abc> <>                |  6:          TRIE-EXACT[ab](15)
                                            failed to match trie start class...
                                          failed...
                                        TRIE matched word #2, continuing
                                        only one match left, short-circuiting: #2 <a>
   1 <a> <bc>                | 15:      CLOSE1(17)
   1 <a> <bc>                | 17:      WHILEM(0)
                                        whilem: matched 1 out of 2..2
   1 <a> <bc>                |  4:        OPEN1(6)
   1 <a> <bc>                |  6:        TRIE-EXACT[ab](15)
   1 <a> <bc>                |            State:    1 Accepted: N Charid:  2 CP:  62 After State:    5
   2 <ab> <c>                |            State:    5 Accepted: Y Charid:  0 CP:   0 After State:    0
                                          got 1 possible matches
                                          TRIE matched word #3, continuing
                                          only one match left, short-circuiting: #3 <b>
   2 <ab> <c>                | 15:        CLOSE1(17)
   2 <ab> <c>                | 17:        WHILEM(0)
                                          whilem: matched 2 out of 2..2
   2 <ab> <c>                | 18:          NOTHING(37)
   2 <ab> <c>                | 37:          END(0)
Match successful!
ab

Freeing REx: " (abc|a|b) {2} | (abc|a|b) {1} "

What the documentation claims is that for any S, S{1,2} is the same as S{2} | S{1}
And, contrary to the ticket's premise that is exactly what is happening here, as shown by the execution trace.  S{2} is the same as SS.  The first S matches 'a', and the second, 'b'.  Thus S{2} matches, and there is no need to try the S{1} alternative.  So the documentation is correct, but the results may not be intuitive.

-- 
Karl Williamson

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=114804

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About