Front page | perl.perl5.porters |
Postings from March 2014
[perl #114804] correction to documentation
Thread Next
From:
Karl Williamson via RT
Date:
March 24, 2014 19:00
Subject:
[perl #114804] correction to documentation
Message ID:
rt-4.0.18-29161-1395687612-690.114804-15-0@perl.org
On Sun Jan 20 09:59:02 2013, pure.mathmo@googlemail.com wrote:
> Or you could say this:
>
> Given A and A' which are matches for S{min,max}, we can think of them as
> matches for S{n} and S{n'}, respectively, for some n, n' between min and
> max. We may assume n <= n'. Then simply compare A and A' as matches for the
> regex S{n}(?:S{n'-n})? , or S{n}(?:S{n'-n})?? in the case of lazy match.
>
> One slight drawback of this wording is that it might not be obvious that
> it's even a transitive relation, since the value of n in the regex depends
> on which two matches you're comparing.
>
> On 19 January 2013 13:05, David Knipe <pure.mathmo@googlemail.com> wrote:
>
> > I don't think there is an easy way to do this. It's all very well writing
> > vague things or simple things that work in most cases, such as the
> > description of lazy vs greedy quantifiers in
> > http://perldoc.perl.org/perlre.html#Regular-Expressions . But there
> > should be somewhere where it's done properly. And it seems to me that the
> > #Combining-RE-Pieces section presents itself as the place where it's done
> > properly.
> >
> > It doesn't matter if the bullet points for quantifiers are much longer
> > than the other bullet points.
> >
> > So as far as I can tell we'll have to settle for something like what I
> > originally wrote, unless someone can find a version of yves orton's
> > description that works for lazy matches too.
> >
> > David
> >
> >
> > On 15 January 2013 03:32, James E Keenan via RT <perlbug-followup@perl.org
> > > wrote:
> >
> >> On Tue Sep 18 17:16:41 2012, pure.mathmo@googlemail.com wrote:
> >> > I see two problems with this:
> >> > (1) For the lazy match, it still gives the same result as the
> >> > documentation file suggests.
> >> > (2) It could be taken to mean that the interpreter tries out all
> >> > possible
> >> > combinations, which could drastically slow things down. For instance,
> >> > if S
> >> > is just a single character, it could (in some hypothetical worst-case
> >> > scenario) have to try 2**n combinations just to match S{0,n}. If n=3,
> >> > the
> >> > combinations would be: (S)(S)(S), (S)(S)(), (S)()(S), (S)()(),
> >> > ()(S)(S),
> >> > etc. Presumably, it is in fact intelligent enough not to try (S)(S)(),
> >> > (S)()(S) and ()(S)(S) separately, in which case the number of things
> >> > to try
> >> > is only linear.
> >> >
> >> > On 18 September 2012 13:03, yves orton via RT <perlbug-
> >> > followup@perl.org>wrote:
> >> >
> >> > >
> >> > > Cant we define it in terms of the ? quantifier? Something like this:
> >> > >
> >> > > S{0,1}= S?
> >> > > S{1,2}= SS?
> >> > > S{1,3}= SS?S?
> >> > > S{2,4}= SSS?S?
> >> > >
> >> > > Yves
> >> > >
> >>
> >> David: Can you state how/where the documentation ought to be changed?
> >> (In effect, provide a patch.) That may help move the discussion of this
> >> issue forward.
> >>
> >> Thank you very much.
> >> Jim Keenan
> >>
> >
> >
I investigated this and believe the documentation is already correct, but could stand some additional text to clarify (proposed patch attached). I ran the code that's in the patch with -Dr, and got the following output.
Compiling REx " (abc|a|b) {2} | (abc|a|b) {1} "
Final program:
1: BRANCH (19)
2: CURLYX[0] {2,2} (18)
4: OPEN1 (6)
6: TRIE-EXACT[ab] (15)
<abc>
<a>
<b>
15: CLOSE1 (17)
17: WHILEM (0)
18: NOTHING (37)
19: BRANCH (FAIL)
20: CURLYX[1] {1,1} (36)
22: OPEN2 (24)
24: TRIE-EXACT[ab] (33)
<abc>
<a>
<b>
33: CLOSE2 (35)
35: WHILEM (0)
36: NOTHING (37)
37: END (0)
minlen 1
Enabling $` $& $' support (0x7).
EXECUTING...
Matching REx " (abc|a|b) {2} | (abc|a|b) {1} " against "abc"
0 <> <abc> | 1:BRANCH(19)
0 <> <abc> | 2: CURLYX[0] {2,2}(18)
0 <> <abc> | 17: WHILEM(0)
whilem: matched 0 out of 2..2
0 <> <abc> | 4: OPEN1(6)
0 <> <abc> | 6: TRIE-EXACT[ab](15)
0 <> <abc> | State: 1 Accepted: N Charid: 1 CP: 61 After State: 2
1 <a> <bc> | State: 2 Accepted: Y Charid: 2 CP: 62 After State: 3
2 <ab> <c> | State: 3 Accepted: Y Charid: 3 CP: 63 After State: 4
3 <abc> <> | State: 4 Accepted: Y Charid: 0 CP: 0 After State: 0
got 2 possible matches
TRIE matched word #1, continuing
3 <abc> <> | 15: CLOSE1(17)
3 <abc> <> | 17: WHILEM(0)
whilem: matched 1 out of 2..2
3 <abc> <> | 4: OPEN1(6)
3 <abc> <> | 6: TRIE-EXACT[ab](15)
failed to match trie start class...
failed...
TRIE matched word #2, continuing
only one match left, short-circuiting: #2 <a>
1 <a> <bc> | 15: CLOSE1(17)
1 <a> <bc> | 17: WHILEM(0)
whilem: matched 1 out of 2..2
1 <a> <bc> | 4: OPEN1(6)
1 <a> <bc> | 6: TRIE-EXACT[ab](15)
1 <a> <bc> | State: 1 Accepted: N Charid: 2 CP: 62 After State: 5
2 <ab> <c> | State: 5 Accepted: Y Charid: 0 CP: 0 After State: 0
got 1 possible matches
TRIE matched word #3, continuing
only one match left, short-circuiting: #3 <b>
2 <ab> <c> | 15: CLOSE1(17)
2 <ab> <c> | 17: WHILEM(0)
whilem: matched 2 out of 2..2
2 <ab> <c> | 18: NOTHING(37)
2 <ab> <c> | 37: END(0)
Match successful!
ab
Freeing REx: " (abc|a|b) {2} | (abc|a|b) {1} "
What the documentation claims is that for any S, S{1,2} is the same as S{2} | S{1}
And, contrary to the ticket's premise that is exactly what is happening here, as shown by the execution trace. S{2} is the same as SS. The first S matches 'a', and the second, 'b'. Thus S{2} matches, and there is no need to try the S{1} alternative. So the documentation is correct, but the results may not be intuitive.
--
Karl Williamson
---
via perlbug: queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=114804
Thread Next
-
[perl #114804] correction to documentation
by Karl Williamson via RT