develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Nicholas Clark
Date:
February 7, 2007 08:24
Subject:
Re: Future Perl development
Message ID:
20070207162418.GP5748@plum.flirble.org
On Wed, Feb 07, 2007 at 01:09:23PM +0100, Mark Overmeer wrote:
> * Nicholas Clark (nick@ccl4.org) [070207 11:52]:
> > Jarkko's view, based on the battle scars from the dragons in the regexp
> > engine, was that fixed width 32 bit was better than anything 16 bit variable
> > width. Doing the latter properly *still* requires dealing with surrogates.
> > 
> > The *best* solution might well be fixed 7/8/16/32, using the smallest that
> > fits.
> > 
> > But I don't see this coming soon.
> 
> And for 7/8bit you would like to keep track of the character-set used
> in the string, such that you can automatically convert to unicode when
> need.

It's simpler to always convert to Unicode on the way in, and to $whatever on
the way out. After all, (as I understand it) one of the features of Unicode
is that it is a superset of all existing encodings. Hence why some of its
choices for what gets distinct code points can seem rather cranky.

I think that Parrot was doing this - convert to Unicode, then store in the
shortest fixed width representation that holds all the code points used in
that string. It's easy to concatenate strings, without needing to pivot
between encodings each time


>        And filenames defined inside your program to the charset used on
> a particular file-system.  And... implicit conversions where we require

IIRC Jarkko has again looked at that recently, and most operating systems
have no sane API to find out what is being used on a particular mounted
filing system.

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About