develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Tels
Date:
March 30, 2007 14:17
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
200703302317.23277@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Moin,

On Friday 30 March 2007 21:00:37 Marvin Humphrey wrote:
> On Mar 30, 2007, at 12:53 PM, Juerd Waalboer wrote:
> > Perl does not have strong typing.
>
> If it is so deadly to collide byte-oriented data with character data,
> it should not be so easy to do so accidentally.

It can happen everytime you concatenate two strings. Maybe we could add a 
new warning?

	use warnings 'upgrade';

	my $a = 'a';
	$a .= "\x100";			# warns

In an application I am currently bringing up to speed in regard to Unicode I 
opted for a "string" struct, that contains essentially:

	* the lenght in bytes
	* the lenght in characters (not always set, e.g. can be unknown)
	* the storage buffer (containing the data, plus some optional padding)
	* the encoding

Every action between two stings thus becomes very clearly defined as you can 
compare their encodings before doing anything. (for instance upgrading one 
or both strings before comparing them etc.) 

In Perl, you have only one bit to tell you the encoding (utf8), and it seems 
this is not enough as strings without that bit set can be either ASCII, or 
ISO-8859-1, or the local locale (maybe?), or utf-8 which hasn't yet tagged 
as UTF-8 etc. In short, it becomes a mess.

All the best,

Tels

- -- 
 Signed on Fri Mar 30 23:11:40 2007 with key 0x93B84C15.
 View my photo gallery: http://bloodgate.com/photos
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Call me Justin, Justin Case."

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRg2ag3cLPEOTuEwVAQKxjwf/Tu2blhDuAawXoTbNOCA9wBnWtvxvwL05
PoIZOI9vSivXF78ooL8/Hta8pC4o2/TgFdYzORyzNGCGNSdkkj/4vnriZ+f67uV2
BQGhzceu7r5U2Byl1xBS/egDB8FOSzB9kX3BcviD+ePjB/gAys0XagCQxfzLiFEa
mCAp3LVVANmXei0/AgoI/Mj2gO+iz4XX3QvqoL/4tr7Dg734pG/SkYvNE5DL2sc0
OfTvQPGc8NmLHseEM8Vt0jY/gApHLK0LFn9yh98BbJaGNIaCzNZxtPABGYWjFoFS
JI1qEVVO4xu0FOJktdEaOSdONTGBincL+4jZ4HbXpi7EMCCZJNLLyw==
=t2+L
-----END PGP SIGNATURE-----

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About