develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
November 16, 2008 17:41
Subject:
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
11732.1226885980@chthon
In-Reply-To Message from demerphq <demerphq@gmail.com> 
   of "Sat, 15 Nov 2008 00:23:34 +0100." 

> 2008/11/14 Tom Christiansen <tchrist@perl.com>:

>>  Replying to Chip Salzenberg's message of "Wed, 12 Nov 2008 18:18:57 PST"
>>  and to Karl Williamson's of "Thu, 13 Nov 2008 11:38:48 MST":

>>  SUMMARY:
>> 
>>   *  There exist in octal character notation both implementation bugs
>>      as well as built-in, by-design bugs, particular when used in
>>      regular expressions.
>>
>>   *  A few of these we've brought on ourselves, because we relaxed
>>      the octal-char definition in ways that they designers of these
>>      things never did, and so some of our troubles with them are our
>>      own fault.
>>
>>   *  The implementation bugs we can fix, if we're careful and
>>      consistent, but design bugs we cannot.
>>
>>   *  Nor can we eliminate the notation altogether, due to the
>>      existing massive code base that relies upon it.

> Yes, this is absolutely clear (now). I misspoke when I suggested this.

>>   *  The best we can do is generate, under certain circumstances, a
>>      warning related to an ambiguous \XXX being interpreted as either
>>      a backreference or a character.

> As you have said, \g{} makes these much less important.

>>  glenn>>> The [below] items could be added to the language
>>  glenn>>> immediately, during the deprecation cycle for \nnn octal
>>  glenn>>> notation [...]

>>  tchrist>> I find the notion of rendering illegal the existing octal
>>  tchrist>> syntax of "\33" is an *EXTRAÖRDINARILY* bad idea, a
>>  tchrist>> position

>>  tchrist>> am prepared to defend at laborious length--and, if
>>  tchrist>> necessary, appeal to the Decider-in-Chief [...]

>>  chip> I am happy to mark my return to p5p by singing in harmony 
>>  chip> with Tom C.

> Don't worry both of you. Just pointing out how much could break
> snapped some sense into my head. Mea-culpa and all that.

> I'll have to take a look at gre as it sounds like it is right
> along the lines of what we need. Afaui we can't go to full DFA
> construction in perl, at least not for every pattern, simply
> because our patterns support recursive constructs, which afaik
> cannot be represented as DFA's.

I don't think they can.  But an adaptive mechanism for those patterns that

>>  Andrew said, sure, it's a bit messy, or untidy, but if you're
>>  looking for pristine perfection, you're looking for the wrong thing.
>>  Or something like that.

> Especially in Perl. :-)

>>  One last thing: Andrew, upon being told about the TRIE regex
>>  optimization, suggests we might look into splay trees for
>>  this instead.  He thinks they have properties that might make
>>  them even faster/smaller, but says we'd have to benchmark the
>>  two carefully, because it was just an informed hunch.

> Hmm, maybe its worth researching into that a bit. The trie
> logic could definitely be improved. We use compressed
> transitions tables when we probably shouldn't. Making each
> transition significantly more expensive than it should be  --
> mostly because of the concern of unicode being able to make the
> number of transitions grow explosively large.

I see, I think.

>>  One this I found especially amusing was this change log
>>  comment:
>>
>>     Fix for a serious bug that affected REs using many []
>>     (including REG_ICASE REs because of the way they are
>>     implemented), *sometimes*, depending on memory-allocation
>>     patterns.
>>
>>  Sound familiar, anybody :-)  [HINT: think of /(\337)\1/i ]

> I'm probably too stupid to get this one. Feel up to spelling it out to
> me offlist?

The problem is one of tricky folding, which you know plenty about
already.  ß => SS, etc.

( NEVER cut yourself down; there are always plenty of others 
  who are only too happy to do that for you, and you shouldn't
  help them. )

Oh, the other thing you forgot was $/.  It's how perl -0777 is
equiv to undef $/, "because \777 is an illegal octal octet".  
That's also precedent for restricting it to \377.

> Sigh. So much to learn. So little time. The latter sounds
> interesting, I haven't looked but i wonder how it handles
> recursive patterns.

Doubtful, but haven't looked.

> I apologise for the shouting.

> Well at the time i made the suggestion (about the regex engine)
> that we do so (in the regex engine) I was not thinking clearly.
> Again I apologize.

Accepted, twice.

> I was just mad because your mail was truncated by gmail. 

Truncates, eh?  I just bounce instead.

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About