Front page | perl.perl6.internals.unicode |
Postings from February 2001
Re: string encoding
Thread Previous
|
Thread Next
From:
Hong Zhang
Date:
February 16, 2001 18:38
Subject:
Re: string encoding
Message ID:
059d01c0988c$000e3b70$2d031dc0@wora
I like to wrap up my argument.
I recommend to use UTF-8 as the sole string encoding.
If we end up with multiple encodings, there is absolutely
no point for this argument.
Benefits of UTF-8 is more compact, less encoding conversion,
more friendly to C API. UTF-16 is variable length encoding
too, if considering the surrogates. UTF-32 is way too big.
The main disadvantage of UTF-8 is O(n) random access, which I
personally believe is not very important, since most text
processing require linear scan of text. Multi-byte encoding
has been widely used in Asian countries for years. It does
not seem to be a significant problem.
If Perl intends to have supurior of Unicode, i18n and l10n,
the benefits of UTF-16 will fade away pretty quickly.
Overall, both UTF-8 and UTF-16 are acceptable. But I believe
UTF-8 is a slightly better choice.
Hong
Thread Previous
|
Thread Next