On 2/7/07, Dave Mitchell <davem@iabyn.com> wrote: > On Wed, Feb 07, 2007 at 10:44:55AM +0100, demerphq wrote: > > I mean heck, utf8 was a kudge worked out on a napkin to make it > > possible to store unicode filenames in a unix style filesystem. (utf8 > > has the property that no encoding of a high codepoint contains any > > special character used by a unix filesystem) > > That's a bit of a misrepresentation! > > UTF8 has very little to do with storing UNIX filenames, and everything > to do with working under traditional C string handling, where a zero byte > is the string terminator, and that degrades gracefully in an environment > that is not Unicode aware and where the majority of the characters are > ASCII or Latin-1. Well, I was referring to this: http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt And it was a placemat, not a napkin. My apologies. :-) yves <quote> The challenges of the programming languages and the UCS standard are being dealt with by other activities in the industry. However, we are still faced with the handling of UCS by historical operating systems and utilities. Prominent among the operating system UCS handling concerns is the representation of the data within the file system. An underlying assumption is that there is an absolute requirement to maintain the existing operating system software investment while at the same time taking advantage of the use the large number of characters provided by the UCS. UCS provides the capability to encode multi-lingual text within a single coded character set. However, UCS and its UTF variant do not protect null bytes and/or the ASCII slash ("/") making these character encodings incompatible with existing Unix implementations. The following proposal provides a Unix compatible transformation format of UCS such that Unix systems can support multi-lingual text in a single encoding. This transformation format encoding is intended to be used as a file code. This transformation format encoding of UCS is intended as an intermediate step towards full UCS support. However, since nearly all Unix implementations face the same obstacles in supporting UCS, this proposal is intended to provide a common and compatible encoding during this transition stage. </quote> -- perl -Mre=debug -e "/just|another|perl|hacker/"