develooper Front page | perl.perl5.porters | Postings from August 2010

Re: [perl #77414] bug report

Thread Previous
From:
demerphq
Date:
August 25, 2010 04:18
Subject:
Re: [perl #77414] bug report
Message ID:
AANLkTi=P6iMHA-N4nipgsyW2eO04k7mDMy-mG=5QHPOB@mail.gmail.com
On 24 August 2010 18:42, Dave U . Random <perlbug-followup@perl.org> wrote:
> # New Ticket Created by  Dave U . Random
> # Please include the string:  [perl #77414]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=77414 >
>
>
> Subject: Inconsistent re backtracking behaviour for regular expressions like /\s*\p{Dash}/, /\s*\p{Dash}{1}/, and /\s*-/ matching input like '- ' and missing match for the first variant.
> Message-Id: <5.12.1_3760_1282664416@my-PC>
> Reply-To: loomisk@trash-mail.com
> To: perlbug@perl.org
>
>
> This is a bug report for perl from loomisk@trash-mail.com,
> generated with the help of perlbug 1.39 running under perl 5.12.1.
>
> Hello everyone!
>
> Using the following code only lines 2 and 3 match:
>
> print '- ' =~ /\s*\p{Dash}/;    # Version 1, no match
> print '- ' =~ /\s*\p{Dash}{1}/; # Version 2, match
> print '- ' =~ /\s*-/;           # Version 3, match
>
> Debugging the regex makes clear why:
>
> Compiling REx "\s*\p{Dash}"
> synthetic stclass "ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash]".
> Final program:
>   1: STAR (3)
>   2:   SPACE (0)
>   3: ANYOF[{unicode}+utf8::Dash] (15)
>  15: END (0)
> stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] minlen 1

The synthetic start-class is wrong. The ASCII part only contains \s
and the unicode part only contains \p{Dash}.

Im pretty sure that that is wrong. The output should at least look like:

ANYOF[\11\12\14\15 ][{unicode}+utf8::IsSpacePerl +utf8::Dash]

And im actually confused why it should not look like:

ANYOF[\11\12\14\15\55 ][{unicode}+utf8::IsSpacePerl +utf8::Dash]

or perhaps

ANYOF[\11\12\14\15 \-][{unicode}+utf8::IsSpacePerl +utf8::Dash]

> Matching REx "\s*\p{Dash}" against "- "
> Matching stclass ANYOF[\11\12\14\15 ][{unicode_all}+utf8::Dash] against "- " (2 chars)
>   1 <-> < >                 |  1:STAR(3)
>                                  SPACE can match 1 times out of 2147483647...
>   2 <- > <>                 |  3:  ANYOF[{unicode}+utf8::Dash](15)
>                                    failed...
>   1 <-> < >                 |  3:  ANYOF[{unicode}\-...+utf8::Dash](15)
>                                    failed...
>                                  failed...
> Contradicts stclass... [regexec_flags]
> Match failed
> Freeing REx: "\s*\p{Dash}"
>
> The first \s* matches the second character of "- " and the \p{Dash} fails, since the regex does not backtrack beyond the last space. But there should be a match for this re and input data...
>
> Version 3 obviously matches because of some internal optimization (seraching for plain "-"),

Actually version three matches because it constructs the correct
synthetic start class:

Compiling REx "\s*-"
synthetic stclass "ANYOF[\11\12\14\15 \-][{unicode_all}]".
Final program:
   1: STAR (3)
   2:   SPACE (0)
   3: EXACT <-> (5)
   5: END (0)
floating "-" at 0..2147483647 (checking floating) stclass
ANYOF[\11\12\14\15 \-][{unicode_all}] minlen 1

Notice the presence of \- in the ANYOF.

>Version 2 should normally be the exact equivalent to 1, but this one backtracks and matches correctly.

Version two matches because it doesnt create any start class
optimisations (which is odd)

Compiling REx "\s*\p{Dash}{1}"
Final program:
   1: STAR (3)
   2:   SPACE (0)
   3: CURLY {1,1} (17)
   5:   ANYOF[{unicode}+utf8::Dash] (0)
  17: END (0)
minlen 1

Notice no ANYOF here at all.

IMO this last is also a bug. A) the engine should optimise:

   3: CURLY {1,1} (17)
   5:   ANYOF[{unicode}+utf8::Dash] (0)

into

   3:   ANYOF[{unicode}+utf8::Dash] (15)

that is there should be no CURLY loop here, and B) the optimiser
should treat 1 and 3 identically, although with similar results to 2.

It is lovely when bugs cancel each other out... Just the thing to keep
us on our toes... :-)

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About