Front page | perl.perl6.internals |
Postings from February 2001
string encoding
Thread Next
From:
Hong Zhang
Date:
February 15, 2001 14:22
Subject:
string encoding
Message ID:
040f01c0979e$fb42a650$2d031dc0@wora
Hi, All,
I want to give some of my thougts about string encoding.
Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like
class String {
virtual UV iterate(/*inout*/ int* index);
};
So in typical string iteration, the code will looks like
for (i = 0; i < size;) {
UV ch = s->iterate(&i);
/* do what u want */
}
instead of
for (i = 0; i < size; i++) {
uint32 ch = s->charAt(i);
/* be my guest */
}
The new style will be strange, but not very difficult to
use. It also hide the internal representation.
The UTF-32 suggestion is largely ignorant to internationalization.
Many user characters are composed by more than one unicode code
point. If you consider the unicode normalization, canonical form,
hangul conjoined, hindic cluster, combining character, varama,
collation, locale, UTF-32 will not help you much, if at all.
Hong
Thread Next
-
string encoding
by Hong Zhang