develooper Front page | perl.perl6.internals | Postings from February 2001

Re: PDD 2, vtables

Thread Previous | Thread Next
February 9, 2001 03:56
Re: PDD 2, vtables
Message ID:
Jarkko Hietaniemi wrote:
> > Umm, one way or another I suspect UTF-8 will be in there.
> I suspect so too but very grudgingly.  As Dan said dealing with
> variable length data is a major pain.  UTF-8 is certainly a much
> better designed VLD than most but it's still a pain.

I guess that's why strings should be abstracted and only accessed by an API
from everywhere outside the string API handling functions.

The string API should be sufficiently smart to be able to convert data from
one encoding to another as it's more convenient. For example, if the
compiler sees a sub with some calls of "substr" inside a loop all acting on
the same string, it would probably setup things so that the sub tells the
string to morph into a string that easily accesses subscripts. If there's
only one "substr" outside of a loop, it probably wouldn't bother doing this,
since the cost of the conversion would be bigger than counting the indexes
on a variable character length string.

On the other side, for a string that is matched against regexps, it doesn't
matter much if it has variable character length, since regexps normally read
all the string anyway, and indexing characters isn't much of a concern.

It would be nice if the user had some control to this, for example by saying
"I don't care this string will be used by substr, leave it in UTF-8 since
it's too big and I don't want to waste memory!", or "This string isn't too
big, so I should convert it to bloated UTF-32 at once!", or even "use less

And I believe 8-bit ASCII will always be an option, for who doesn't care
about extended characters and want the best of both worlds on speed and
memory usage.

- Branden

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About