develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Jan Dubois
Date:
February 7, 2007 15:35
Subject:
Re: Future Perl development
Message ID:
ajnks298ndgiro46dboi8njgn5a6llil6a@4ax.com
On Wed, 7 Feb 2007 10:44:55 +0100, demerphq <demerphq@gmail.com> wrote:

>I for one would argue that if we were going to go to a single internal
>encoding that utf8 would be the wrong one. Utf-16 would be much
>better. It would allow us to take advantage of the large amount of
>utf-16 code out there, ranging from DFA regexp engines to other
>algorithms and libraries. On Win32 the OS natively does utf-16 so much

Some people argue that only Windows XP and later does UTF-16; Windows
2000 is just UCS2 because it doesn't know about surrogate pairs.  But
this is really only a shell/display issue; the file system level doesn't
care about surrogates anyways.

>of the work would be done by the OS. Id bet that this was also a
>reason why other languages choose to use utf-16. In fact i wouldnt be
>surprised if we were the primary language using utf8 internally at
>all.

This is a couple years old and no longer up-to-date, but yes, it looks
like Perl doesn't have much company...

    http://unicode.org/notes/tn12/tn12-1.html

>I mean heck, utf8 was a kudge worked out on a napkin to make it
>possible to store unicode filenames in a unix style filesystem. (utf8
>has the property that no encoding of a high codepoint contains any
>special character used by a unix filesystem) WTF would we use a kludge
>as our primary internal representation when there are better
>representations to use? Especially when you consider the performance
>impact of doing so (use unicode and watch the regex engine get much
>sloooooweeeeeerrrrrrr.)

This is probably the main reason some big enterprise users stick with
Perl 5.6.1.  I've seen several companies approach ActiveState, desperate
to get help in moving to 5.8 while maintaining their application
performance.  Unfortunately there is not much you can do to help them
beyond the "avoid using Unicode strings, and downgrade every time a
module returns stuff in Unicode" advice.

Cheers,
-Jan




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About