develooper Front page | perl.perl5.porters | Postings from January 2019

Re: [perl #133756] //g flag on regex with UTF-8 source causes regexoptimiser to wrongly reject a match

Thread Previous
From:
Karl Williamson
Date:
January 9, 2019 17:28
Subject:
Re: [perl #133756] //g flag on regex with UTF-8 source causes regexoptimiser to wrongly reject a match
Message ID:
6365eff8-c439-830c-57c7-b8880cb0477d@khwilliamson.com
On 1/9/19 10:14 AM, Nicholas Clark wrote:
> On Wed, Jan 09, 2019 at 09:49:59AM -0700, Karl Williamson wrote:
> 
>> My gvim syntax highlighter immediately showed that \x100 is \x10 followed by
>> a "0".  Without that, I would have expected that $char contained a single
>> character: \x{100}.  The /g would cause the second character, the "0"
>> (U+0030) to be attempted to be matched.  I haven't investigated further,
>> because my guess is that is what is going on here.  If you say there is more
>> to it, then I'll investigate further.
> 
> Thanks for the rapid response. I think that you might be right, but will
> investigate further tomorrow at work with a fresh head.
> 
> This would mean that I've failed to correctly reduce the original problem
> to a representitive test case. (The problem might *still* be PEBKAC, but
> the original discrepency in the much larger input and generated regex looked
> like a bug.)

I looked at it a little more, and it seems weird that a comment would 
cause this variance in behavior.  OTOH, eval of UTF-8 is buggy, and 
that's why we have the unicode_eval feature.

> 
> Nicholas Clark
> 

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About