On Mon, Feb 19, 2001 at 06:07:14PM -0500, Andrew Pimlott wrote: > On Mon, Feb 19, 2001 at 04:47:53PM -0600, Jarkko Hietaniemi wrote: > > > > As far "what is broken", I do understand the concern of "exposing too > > much of the internal representation" (which at the moment happens to > > be UTF-8) to the user, having bytes and character is confusing at > > best. However, I'm not fully convinced that completely hiding it is > > wise, either. If from Perl level one cannot reach back to the bytes > > comprising the UTF-8 representation of the characters, I feel we are > > trying to pad the cell too softly. > > My kingdom for one example. If you step out of the box, it's easy to come up with examples. When ever you need to interface with something that has no understanding of Unicode, for which everything is data, you want to be able to look bytewise to your strings. When talking to a serial device for instance, or a hard disk, whose capacity will be measured in bytes, not variable width characters. Device drivers might not be commonly written in Perl, but that doesn't mean it should be impossible. But you don't have to go that low level. uuencode & base64 work with 8-bit bytes. Taking your Unicode string, looking at it as bytes, uuencode it, send it, receive it, uudecode it and looking at it again as Unicode will work - as long as you can get to the bytes representation. A lot of existing compression and encryption software just look at the data to be compressed or encrypted as bit or byte streams. There is no reason to create Unicode aware versions of those tools before they can be used on Unicode data. But to create Perl programs that compresses or encrypts data that can be decompressed or decrypted with the existing tools, your Perl program needs to be able to look at the data as a sequence of bytes. When in Rome.... Abigail