Front page | perl.perl5.porters |
Postings from November 2008
Re: char16 datatype
Thread Previous
|
Thread Next
From:
karl williamson
Date:
November 13, 2008 21:12
Subject:
Re: char16 datatype
Message ID:
491D08C3.303@khwilliamson.com
Tom Christiansen wrote:
> [snip]
>
> There's a bunch going on with standardization, widechars, utf-8, etc, right
> now. If only UTF-8 had been around earlier ("What, 1992 isn't early
> enough?"), a lot of trouble would have been averted. That Perl settled on
> UTF-8 internally early on was applauded by the Association's current
> standards rep as clearly the right way to go.
>
> It's really sad that it looks like the C std committee look to be going to
> accept Microsoft's char16 datatype for wide characters. This locks you
> into UCS-2/UTF-16, whihc means surrogates to get off the primary plane, and
> a very long/bad recovery if you get poke your head in the wrong place in
> the stream. This is going to make problems for people. Java has the
> problem. EXIF has the problem.
>
I have a friend on the ISO C committee. I sent him the above snippet
and asked him to comment. This may not have anything really to do with
Perl 5, but since it got brought up, fyi, here's his response:
Karl,
C has always bent over backwards to be character set agnostic, so that
any reasonable character set would work with it. That is not going to
change. Most people will still use char, which these days will usually
get UTF-8, depending on the locale. When that is not enough, most
people will still use wchar_t, which these days will usually get UTF-32.
For data interchange, some people also needed data types that were
guaranteed to be encoded as UTF-16 or UTF-32. Contrary to the stated
purpose of TR-19769, the community that needed this discovered after the
report was published that they do not need to do large amounts of
internal processing with these types; they just need to be able to read
and write these known encodings. Therefore, although TR-19769 hints at
a large API that will need to be standardized for the new types, this
will not actually happen. Only the bare essentials presented in the TR
will be provided.
It is not anticipated that these new types will be used in most
code, only in certain special cases where no other current solution is
adequate. Most people can ignore this feature, but it is important in a
few data interchange applications.
[signature omitted because I don't know if s/he cared if s/he was
identified or not]
Thread Previous
|
Thread Next