develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Nicholas Clark
Date:
February 7, 2007 03:52
Subject:
Re: Future Perl development
Message ID:
20070207115248.GA5748@plum.flirble.org
On Wed, Feb 07, 2007 at 10:44:55AM +0100, demerphq wrote:
> On 2/7/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> >
> >
> >However, all that encode/decode overhead would kill the performance
> >of these libraries, rendering them far less useful.  It would be nice
> >it Perl's internal encoding was always, officially UTF-8 -- then
> >there wouldn't be a conflict.  But I imagine that might be very hard
> >to pull off on EBCDIC systems, so maybe it's better this way -- I get
> >to choose not to support EBCDIC systems (along with systems that
> >don't use IEEE 754 floats, and systems where chars are bigger than a
> >byte).
> 
> I for one would argue that if we were going to go to a single internal
> encoding that utf8 would be the wrong one. Utf-16 would be much
> better. It would allow us to take advantage of the large amount of
> utf-16 code out there, ranging from DFA regexp engines to other
> algorithms and libraries. On Win32 the OS natively does utf-16 so much
> of the work would be done by the OS. Id bet that this was also a
> reason why other languages choose to use utf-16. In fact i wouldnt be
> surprised if we were the primary language using utf8 internally at
> all.

Jarkko's view, based on the battle scars from the dragons in the regexp
engine, was that fixed width 32 bit was better than anything 16 bit variable
width. Doing the latter properly *still* requires dealing with surrogates.

The *best* solution might well be fixed 7/8/16/32, using the smallest that
fits.

But I don't see this coming soon.

> The problem here is not our internal encoding, which should be opaque,
> but rather our lack of support for an explicitly byte oriented storage
> and our heritage of treating strings as character buffers, even though
> they arent really.

Yes.

And the fact that UTF-8 peeked through the cracks left right and centre
in 5.6.0 really didn't help the opaqueness.

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About