develooper Front page | perl.perl5.porters | Postings from January 2012

Re: [perl #109206] regexes: . different from [^\n]

Thread Previous | Thread Next
From:
demerphq
Date:
January 28, 2012 04:22
Subject:
Re: [perl #109206] regexes: . different from [^\n]
Message ID:
CANgJU+UBRUZL1sQoOVbtsa59dvMVBAnZJxE7+awrit+Ug2EDbw@mail.gmail.com
On 27 January 2012 21:51, Lukas Mai <l.mai@web.de> wrote:
> On 2012-01-27 Father Chrysostomos via RT wrote:
>
>> On Fri Jan 27 07:33:32 2012, demerphq wrote:
>> >  /* turn .* into ^.* with an implied $*=1 */
>> >
>> > I have to admit I have not checked to see what the heck $*=1 means.
>>
>> $* doesn’t do anything anymore, unless you are using Classic::Perl.
>>
>> $* = 1 puts /m on every match in 5.8, bugs aside.

Ah, thanks. Pity the comment doesnt say "with an implied /m" instead.

>> What the comment means exactly by implied $*=1 I don’t know.  Is it
>> referring to /^/ meaning /^/m in split?  But that couldn’t be right.
>
> It means that a regexp that starts with .* is implicitly anchored
> because if it doesn't match at offset 0, it won't match at offsets 1,
> 2, 3 ... either. /m is implied because (since .* won't cross newlines)
> there can be multiple possible match locations if the string contains
> \n. Which means you have to check every embedded \n for a match.

Yes, right.

> (Conversely, if /s is active, leading .* should generate an implicit ^
> with /m off (a.k.a. \A).)
>
> AFAICS this optimization is valid except when the target string ends
> with a newline. In that case .* could (and should) match, but /^/m
> won't. That is, "\n" =~ /^/mg only matches once.

One might argue this is the bug. It probably should match before and
after as well.

>
> So ... I guess the regex code should behave differently if the /^/m is
> implicit and \n is the last character in the target string?

Thing is the optimization is enabled before we ever see the string at
all. It cannot depend on the contents of the string.

So we either have to figure out how to make it match properly or
simply disable it.

> (And maybe there's a missed optimization opportunity here because I
> don't see why this special case shouldn't trigger for [^\n]* at the
> beginning of a pattern.)

Because it isnt easy to introspect the contents of a charclass.


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About