develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Tels
Date:
March 31, 2007 03:40
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
200703311226.53503@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Moin,

On Saturday 31 March 2007 00:29:42 Marc Lehmann wrote:
> On Sat, Mar 31, 2007 at 02:16:49AM +0200, Juerd Waalboer 
<juerd@convolution.nl> wrote:
> > Marc Lehmann skribis 2007-03-31  2:12 (+0200):
> > > Yes, and the exact same is true for unicode (both have a 1-1 mapping
> > > between 0..255 and octets), trivially, of course, as unicode
> > > explicitly is a superset of latin1.
> >
> > Unicode is a character set, not a character encoding.
>
> As is latin1.
>
> > A unicode string is a sequence of codepoints, not octets.
>
> Nope. You can encode unicode codepoints into UTF-8 and still end up with
> a unicode string. Encoding doesn't change the fact that it is unicode
> that your are storing.
>
> Since it seems hard to grasp, here is an example:
>
>    my $s = "Hello, World!";
>    $s = Encode::encode_utf8 $s;
>
> $s contains the famous greeting before and after the encoding. It is
> still an ASCII string, iso-8859-15 string, and a unicode string, and a
> text string, regardless of wether it is encoded or not, that does not
> change the fact that that string contaisn the message "Hello, World!".
>
> If you drop ASCII, the same is true for "Hallöchen!", which looks
> differently in UTF-8 then in an unencoded string, but it is still the
> same message. And it is till using unicode to represent the characters.
>
> The fact that you encode something does not change the something that you
> encode. Making an arbitrary difference only confuses the issue.

Especially since Perl itself doesn't have any way to distinguish "a" 
(UNKNOWN ENCODING) from "a" (ASCII) from "a" (ISI-8859-1) from "a" 
(UTF-8) - except one bit :)

All the best,

Tels

- -- 
 Signed on Sat Mar 31 12:24:31 2007 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Most people, I think, don't even know what a rootkit is, so why should
 they care about it?"

  -- Thomas Hesse, President of Sony BMG's global digital business
     division, 2005.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEUAwUBRg5TjXcLPEOTuEwVAQIrGAf417/05df4c3hIzTnFoidS3fAKWPHm9Ots
5BNa8n3PJci4cGQ2Sz7LzRf4BjD6+seW8Zq6fKNMIlCpmwCJYh/M+Ol8BBGefjhU
tJxebJs1O2K+ZEd9cJTP/PP2bnqg9Z1CwiBNn8xT/cT8tbF6rR9kujaHooSkHnPV
snDog7uLrk117tof8ORcybml0bDfhWzh4UfYOyue37RyrqAWnIXNOu24uYUjMiDT
US3vym0LX+LUO4aBS9Ur/tX6FSBX/5mXDn0fPR016ESbzWA6TMMurSIjWYLFTw9R
rRK0KSAb/z93Z6ZhHvyaKOz8Tt9ma44adu6WgTXrK5dcrpih8xbX
=Q94f
-----END PGP SIGNATURE-----

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About