develooper Front page | perl.perl6.internals | Postings from February 2001

string encoding

Thread Next
Hong Zhang
February 15, 2001 14:22
string encoding
Message ID:
Hi, All,

I want to give some of my thougts about string encoding.

Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like

class String {
    virtual UV iterate(/*inout*/ int* index);

So in typical string iteration, the code will looks like
    for (i = 0; i < size;) {
        UV ch = s->iterate(&i);
        /* do what u want */
instead of
    for (i = 0; i < size; i++) {
        uint32 ch = s->charAt(i);
        /* be my guest */

The new style will be strange, but not very difficult to
use. It also hide the internal representation.

The UTF-32 suggestion is largely ignorant to internationalization.
Many user characters are composed by more than one unicode code
point. If you consider the unicode normalization, canonical form,
hangul conjoined, hindic cluster, combining character, varama,
collation, locale, UTF-32 will not help you much, if at all.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About