develooper Front page | perl.perl6.internals.unicode | Postings from February 2001

Re: string encoding

Thread Previous | Thread Next
From:
Hong Zhang
Date:
February 16, 2001 18:38
Subject:
Re: string encoding
Message ID:
059d01c0988c$000e3b70$2d031dc0@wora
I like to wrap up my argument.

I recommend to use UTF-8 as the sole string encoding.
If we end up with multiple encodings, there is absolutely
no point for this argument.

Benefits of UTF-8 is more compact, less encoding conversion,
more friendly to C API. UTF-16 is variable length encoding
too, if considering the surrogates. UTF-32 is way too big.

The main disadvantage of UTF-8 is O(n) random access, which I
personally believe is not very important, since most text
processing require linear scan of text. Multi-byte encoding
has been widely used in Asian countries for years. It does
not seem to be a significant problem.

If Perl intends to have supurior of Unicode, i18n and l10n,
the benefits of UTF-16 will fade away pretty quickly.

Overall, both UTF-8 and UTF-16 are acceptable. But I believe
UTF-8 is a slightly better choice.

Hong


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About