develooper Front page | perl.perl5.porters | Postings from November 2008

Re: char16 datatype

Thread Previous | Thread Next
From:
karl williamson
Date:
November 13, 2008 21:12
Subject:
Re: char16 datatype
Message ID:
491D08C3.303@khwilliamson.com
Tom Christiansen wrote:
> [snip]
> 
> There's a bunch going on with standardization, widechars, utf-8, etc, right
> now. If only UTF-8 had been around earlier ("What, 1992 isn't early
> enough?"), a lot of trouble would have been averted.  That Perl settled on
> UTF-8 internally early on was applauded by the Association's current
> standards rep as clearly the right way to go.
> 
> It's really sad that it looks like the C std committee look to be going to
> accept Microsoft's char16 datatype for wide characters.  This locks you
> into UCS-2/UTF-16, whihc means surrogates to get off the primary plane, and
> a very long/bad recovery if you get poke your head in the wrong place in
> the stream.  This is going to make problems for people.  Java has the
> problem.  EXIF has the problem.
> 

I have a friend on the ISO C committee.  I sent him the above snippet 
and asked him to comment.  This may not have anything really to do with 
Perl 5, but since it got brought up, fyi, here's his response:

Karl,

  C has always bent over backwards to be character set agnostic, so that 
any reasonable character set would work with it.  That is not going to 
change.  Most people will still use char, which these days will usually 
get UTF-8, depending on the locale.  When that is not enough, most 
people will still use wchar_t, which these days will usually get UTF-32.

      For data interchange, some people also needed data types that were 
guaranteed to be encoded as UTF-16 or UTF-32.  Contrary to the stated 
purpose of TR-19769, the community that needed this discovered after the 
report was published that they do not need to do large amounts of 
internal processing with these types; they just need to be able to read 
and write these known encodings.  Therefore, although TR-19769 hints at 
a large API that will need to be standardized for the new types, this 
will not actually happen.  Only the bare essentials presented in the TR 
will be provided.

      It is not anticipated that these new types will be used in most 
code, only in certain special cases where no other current solution is 
adequate.  Most people can ignore this feature, but it is important in a 
few data interchange applications.

[signature omitted because I don't know if s/he cared if s/he was 
identified or not]

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About