develooper Front page | perl.perl6.internals | Postings from January 2002

Re: on parrot strings

Thread Previous | Thread Next
From:
Steve Fink
Date:
January 18, 2002 13:40
Subject:
Re: on parrot strings
Message ID:
20020118214026.GB24501@foxglove.digital-integrity.com
On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> ints, or 176 bytes. Searching for membership in an inversion list is
> O(N log N) (binary search).  "Encoding the whole range" is a non-issue
> bordering on a joke: two ints, or 8 bytes.

[Clarification from a noncombatant] You meant O(log N).

I like the inversion list idea. But its speed is proportional to the
toothiness of the character class, and while I have good intuition for
what that means in 7-bit US-ASCII, I have no idea how bad it gets for
other languages. "Vowels"? "Capital letters"? Would anyone ever want
to select all Chinese characters with a particular radical?

That's just lookup. We should also consider other character class
operations: union, subtraction, intersection. They're pretty
straightforward and fast (O(N)) for inversion lists. (Yes, all these
operations can be postponed until lookup time, regardless of the
underlying represention, in which case the time of union(C1,C2) is
just the time of C1 + time of C2 + time of an 'or'.)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About