Front page | perl.perl5.porters |
Postings from November 2008
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Thread Previous
|
Thread Next
From:
Tom Christiansen
Date:
November 16, 2008 17:41
Subject:
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
11732.1226885980@chthon
In-Reply-To Message from demerphq <demerphq@gmail.com>
of "Sat, 15 Nov 2008 00:23:34 +0100."
> 2008/11/14 Tom Christiansen <tchrist@perl.com>:
>> Replying to Chip Salzenberg's message of "Wed, 12 Nov 2008 18:18:57 PST"
>> and to Karl Williamson's of "Thu, 13 Nov 2008 11:38:48 MST":
>> SUMMARY:
>>
>> * There exist in octal character notation both implementation bugs
>> as well as built-in, by-design bugs, particular when used in
>> regular expressions.
>>
>> * A few of these we've brought on ourselves, because we relaxed
>> the octal-char definition in ways that they designers of these
>> things never did, and so some of our troubles with them are our
>> own fault.
>>
>> * The implementation bugs we can fix, if we're careful and
>> consistent, but design bugs we cannot.
>>
>> * Nor can we eliminate the notation altogether, due to the
>> existing massive code base that relies upon it.
> Yes, this is absolutely clear (now). I misspoke when I suggested this.
>> * The best we can do is generate, under certain circumstances, a
>> warning related to an ambiguous \XXX being interpreted as either
>> a backreference or a character.
> As you have said, \g{} makes these much less important.
>> glenn>>> The [below] items could be added to the language
>> glenn>>> immediately, during the deprecation cycle for \nnn octal
>> glenn>>> notation [...]
>> tchrist>> I find the notion of rendering illegal the existing octal
>> tchrist>> syntax of "\33" is an *EXTRAÖRDINARILY* bad idea, a
>> tchrist>> position
>> tchrist>> am prepared to defend at laborious length--and, if
>> tchrist>> necessary, appeal to the Decider-in-Chief [...]
>> chip> I am happy to mark my return to p5p by singing in harmony
>> chip> with Tom C.
> Don't worry both of you. Just pointing out how much could break
> snapped some sense into my head. Mea-culpa and all that.
> I'll have to take a look at gre as it sounds like it is right
> along the lines of what we need. Afaui we can't go to full DFA
> construction in perl, at least not for every pattern, simply
> because our patterns support recursive constructs, which afaik
> cannot be represented as DFA's.
I don't think they can. But an adaptive mechanism for those patterns that
>> Andrew said, sure, it's a bit messy, or untidy, but if you're
>> looking for pristine perfection, you're looking for the wrong thing.
>> Or something like that.
> Especially in Perl. :-)
>> One last thing: Andrew, upon being told about the TRIE regex
>> optimization, suggests we might look into splay trees for
>> this instead. He thinks they have properties that might make
>> them even faster/smaller, but says we'd have to benchmark the
>> two carefully, because it was just an informed hunch.
> Hmm, maybe its worth researching into that a bit. The trie
> logic could definitely be improved. We use compressed
> transitions tables when we probably shouldn't. Making each
> transition significantly more expensive than it should be --
> mostly because of the concern of unicode being able to make the
> number of transitions grow explosively large.
I see, I think.
>> One this I found especially amusing was this change log
>> comment:
>>
>> Fix for a serious bug that affected REs using many []
>> (including REG_ICASE REs because of the way they are
>> implemented), *sometimes*, depending on memory-allocation
>> patterns.
>>
>> Sound familiar, anybody :-) [HINT: think of /(\337)\1/i ]
> I'm probably too stupid to get this one. Feel up to spelling it out to
> me offlist?
The problem is one of tricky folding, which you know plenty about
already. ß => SS, etc.
( NEVER cut yourself down; there are always plenty of others
who are only too happy to do that for you, and you shouldn't
help them. )
Oh, the other thing you forgot was $/. It's how perl -0777 is
equiv to undef $/, "because \777 is an illegal octal octet".
That's also precedent for restricting it to \377.
> Sigh. So much to learn. So little time. The latter sounds
> interesting, I haven't looked but i wonder how it handles
> recursive patterns.
Doubtful, but haven't looked.
> I apologise for the shouting.
> Well at the time i made the suggestion (about the regex engine)
> that we do so (in the regex engine) I was not thinking clearly.
> Again I apologize.
Accepted, twice.
> I was just mad because your mail was truncated by gmail.
Truncates, eh? I just bounce instead.
--tom
Thread Previous
|
Thread Next