develooper Front page | perl.perl6.internals.unicode | Postings from February 2001

Re: string encoding

Thread Previous
From:
Hong Zhang
Date:
February 16, 2001 17:02
Subject:
Re: string encoding
Message ID:
058301c0987e$907e4730$2d031dc0@wora
> > I think you already mixed the codepoint vc character. What you will get
is
> > 10th codepoint, not 10th character.
>
> I think you're confused. Codepoints *are* characters. Combining characters
are
> taken care of as per the RFC.

If you define that way, I can agree with it. Since you still have to handle
combining character in different place, you will not save much overall.

> I'm talking about UTF16. You're talking about UTF32.
> Try talking about what I'm talking about.

With UTF-16, you have to handle surrogate, right? It is still variable
length
encoding. At this time, the surrogate is undefined. In case it is widely
used,
the nightmare will come back.

> > I said it is not common case
>
> And I am saying that it is.
>
> I have been through this many, many times. I am not going through it
> again.

What I can see is that you argue the random access is important and
and nice to have. But I don't see it is common case. Can you name
some practical text algorithms or usages in Perl? I think Perl is
not the language that is designed for character by character text
process. As long as regexp is faster enough, most people will be
happy.

Hong


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About