develooper Front page | perl.perl6.internals | Postings from January 2002

Re: parrot rx engine

Thread Previous | Thread Next
From:
Graham Barr
Date:
January 31, 2002 11:33
Subject:
Re: parrot rx engine
Message ID:
20020131193229.C69229@pobox.com
On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote:
> > Because parts of an rx can be case-insensitive while other parts
> > are case-sensitive, we will probably need two sorts of ops anyway
> > (or a way to tell the op to be case-insensitive).  And you will
> > only be able to do the case folding when the whole rx is 
> > case-insensitive.
> 
> I don't like your suggestion. I think we should have one set of
> ops, but two input strings: one is the original, the other is case-
> folded. Rx chooses the right one depending on the current 
> case-sensitivity. 2 regex opcodes will be used for this purpose,
> op-case-sensitive-start and op-case-insensitive-start. The opcode
> will switch strings begins, ends, positions etc.
> 
> > It also means creating a copy of the input string, which is something
> > the current rx engine in perl5 tries to avoid. And while I will agree
> > that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
> > that is normally only the case for small-ish strings.
> 
> I don't think the perl5 approach is the best choice. Unicode case folding
> is much much more expensive than malloc/free. And we can always use
> per-thread free list, unless the regex is nested or the string is very
> big, we don't need to allocate any memory.

But as you say, case folding is expensive. And with this approach you
are going to case-fold every string that is matched against an rx
that has some part of it that is case-insensitive.

The case-folding should be done in the rx itself, at compile time if possible.
Then it is only done once, which will save a lot of time if the rx happens
to be used in a loop or something.

Graham.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About