develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Juerd Waalboer
Date:
March 30, 2007 17:17
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
20070331001649.GJ31277@c4.convolution.nl
Marc Lehmann skribis 2007-03-31  2:12 (+0200):
> Yes, and the exact same is true for unicode (both have a 1-1 mapping
> between 0..255 and octets), trivially, of course, as unicode explicitly is
> a superset of latin1.

Unicode is a character set, not a character encoding.

While for 8 bit character sets, the encoding is the same thing, once you
get past the 8 bit boundary, the difference begins to matter.

A unicode string is a sequence of codepoints, not octets. They don't map
1:1 to octets either. To express a unicode string in octects, you need
to encode it. For this, there are several possibilities, including
UTF-8, UTF-16, ...

Unicode is a superset of the latin1 character set, not the latin1
character encoding. We'd need bigger bytes for the latter :)
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  <juerd@juerd.nl>  <http://juerd.nl/sig>
  convolution:     ict solutions and consultancy <sales@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About