develooper Front page | perl.perl5.porters | Postings from January 2012

Re: [perl #109206] regexes: . different from [^\n]

Thread Previous | Thread Next
Lukas Mai
January 27, 2012 13:31
Re: [perl #109206] regexes: . different from [^\n]
Message ID:
On 2012-01-27 Father Chrysostomos via RT wrote:

> On Fri Jan 27 07:33:32 2012, demerphq wrote:
> >  /* turn .* into ^.* with an implied $*=1 */
> > 
> > I have to admit I have not checked to see what the heck $*=1 means.
> $* doesn’t do anything anymore, unless you are using Classic::Perl.
> $* = 1 puts /m on every match in 5.8, bugs aside.
> What the comment means exactly by implied $*=1 I don’t know.  Is it
> referring to /^/ meaning /^/m in split?  But that couldn’t be right.

It means that a regexp that starts with .* is implicitly anchored
because if it doesn't match at offset 0, it won't match at offsets 1,
2, 3 ... either. /m is implied because (since .* won't cross newlines)
there can be multiple possible match locations if the string contains
\n. Which means you have to check every embedded \n for a match.

(Conversely, if /s is active, leading .* should generate an implicit ^
with /m off (a.k.a. \A).)

AFAICS this optimization is valid except when the target string ends
with a newline. In that case .* could (and should) match, but /^/m
won't. That is, "\n" =~ /^/mg only matches once.

So ... I guess the regex code should behave differently if the /^/m is
implicit and \n is the last character in the target string?

(And maybe there's a missed optimization opportunity here because I
don't see why this special case shouldn't trigger for [^\n]* at the
beginning of a pattern.)

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About