develooper Front page | perl.perl5.porters | Postings from February 2015

[perl #123820] documentation error in perlrecharclass

Thread Previous | Thread Next
From:
James E Keenan via RT
Date:
February 14, 2015 14:29
Subject:
[perl #123820] documentation error in perlrecharclass
Message ID:
rt-4.0.18-6849-1423924142-1542.123820-15-0@perl.org
On Fri Feb 13 21:13:32 2015, demerphq wrote:
> On 14 February 2015 at 07:58, James E Keenan via RT
> <perlbug-followup@perl.org> wrote:
> > On Fri Feb 13 11:28:10 2015, saint.snit@gmail.com wrote:
> >>
> >> This is a bug report for perl from saint.snit@gmail.com,
> >> generated with the help of perlbug 1.39 running under perl 5.18.2.
> >>
> >>
> >> -----------------------------------------------------------------
> >> The "perlrecharclass" documentation -- both that shipped with perl
> >> 5.18.2
> >> and that appearing at http://perldoc.perl.org/perlrecharclass.html
> >> --
> >> contains an error.
> >>
> >> It claims that the regular expression /[[]]/ "contains a character
> >> class containing just ], and the character class is followed by a
> >> ]".
> >> This does not appear to be an accurate description of this regular
> >> expression: the leading character class appears to contain just [.
> >>
> >
> > I believe the analysis is correct.
> >
> > Here is the way the documentation appears in perl-5.10.1 (some
> > whitespace trimmed):
> >
> > #####
> > "[]"  =~ /[[]]/      #  Match, the pattern contains a character class
> >                      #  containing just ], and the character class is
> >                      #  followed by a ].
> > #####
> >
> 
> It looks like this is a typo. it should say "containing just [".
> 
> 
> > Let's stipulate that the final ']' is outside the character class.
> > Then I ought to be able to rewrite the pattern to capture the
> > contents of the character class, like so:
> >
> [snip]
> > This suggests that the character class holds a single open-bracket
> > '[' -- not a single close-bracket ']'.  This in turn suggests that
> > the documentation is indeed wrong.
> >
> 
> Interesting approach. For future reference the way I would analyse it
> is as follows:
> 
> $ perl -Mre=debug -e'/[[]]/'
> Compiling REx "[[]]"
> Final program:
>    1: EXACT <[]> (5)
>    5: END (0)
> anchored "[]" at 0 (checking anchored isall) minlen 2
> Freeing REx: "[[]]"
> 
> Which shows that the original pattern is exactly equivalent to m/ \[
> \] /x (using /x mode for legibility)
> 
> Meaning it can't be what the documentation says.
> 
> And you can drill deeper and see exactly what happens like this (added
> comments by me starting with #)
> 
> $ perl -Mre=Debug,COMPILE -e'/[[]]/'
> Assembling pattern from 1 elements
> Compiling REx "[[]]"
> Starting first pass (sizing)
> > [[]]<         |   1|  reg
> |    |    brnc
> |    |      piec
> |    |        atom
> > []]<          |    |          clas
> 
> #At this point we have consumed the first open square bracket as the
> beginning of a char class.
> 
> > ]<            |   3|      piec
> 
> #At this point we have consumed the second open square backet as an
> element of the char-class, and also the first close square bracket, as
> the close of the char-class definition, and we have one more close
> square bracket left to parse,
> 
> |    |        atom
> 
> #Which we parse as an "atom", in  this case a literal.
> 
> Required size 5 nodes
> Starting second pass (creation)
> > [[]]<         |   1|  reg
> |    |    brnc
> |    |      piec
> |    |        atom
> > []]<          |    |          clas
> > ]<            |   3|      piec
> |    |        atom
> > <             |   5|      tail~ EXACT <[> (1) -> EXACT
> 
> #Here we can see that the  charclass containing a single item has been
> converted into the literal item (EXACT)
> 
> |   6|  lsbr~ tying lastbr EXACT <[> (1) to ender END
> (5) offset 4
> |    |    tail~ EXACT <[> (1)
> |    |        ~ EXACT <]> (3) -> END
> first:>  1: EXACT <[> (3)
> first at 1
> Peep> 1: EXACT <[> (3)
> join> 1: EXACT <[> (3)
> merg> 3: EXACT <]> (5)
> finl> 1: EXACT <[]> (5)
> 
> #And here we can see that the two EXACT nodes, one containing '[' and
> the other containing ']' are joined together into a single EXACT node
> which contains '[]'
> 
> minlen: 2 r->minlen:0
> Final program:
>    1: EXACT <[]> (5)
>    3: OPTIMIZED (2 nodes)
> 
> #This "OPTIMIZED" node is the remainder of the second EXACT that was
> left over after merging.
> 
> 5: END (0)
> anchored "[]" at 0 (checking anchored isall) minlen 2
> r->extflags: CHECK_ALL USE_INTUIT_NOML USE_INTUIT_ML
> Freeing REx: "[[]]"
> 
> #And this says that to match the string must be 2 chars long, it must
> contain the string '[]', and that the internals need not execute the
> regex engine at all, and instead will simply use FBM matching instead.
> 
> 
> > If others agree, I will patch pod/perlrecharclass.pod.
> 
> I agree the analysis is sound, and I /think/ the original
> documentation was just a typo, but that does not rule out that this is
> a subtle regression and that older perls did actually parse as
> documented. So to be sure it would be good to test this on 5.8.x, if
> it also reduces down to EXACT <[]> then we are good to go. If not then
> this is a regression. My money is on it NOT being a regression, (if i
> were a betting man anyway).
> 
> Yves

I have patched pod/perlrecharclass.pod in commit 52f4d632547391b1db71d16e631dd023dcd6a9b0.

Yves, I was able to use your two diagnostics as far back as perl-5.10.1 and get similar results.  The first diagnostic worked on perl-5.8.9; the second did not.

#####
$ perl -v | head -2 | tail -1
This is perl, v5.8.9 built for x86_64-linux

$ perl -Mre=Debug,COMPILE -e'/[[]]/'
Unknown "re" subpragma 'Debug' (known ones are: 'debug', 'debugcolor', 'eval', 'taint') at -e line 0
Unknown "re" subpragma 'COMPILE' (known ones are: 'debug', 'debugcolor', 'eval', 'taint') at -e line 0


-- 
James E Keenan (jkeenan@cpan.org)

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=123820

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About