develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
demerphq
Date:
February 7, 2007 16:25
Subject:
Re: Future Perl development
Message ID:
9b18b3110702071624j3bac5f4ib99c826707eba3e@mail.gmail.com
On 2/7/07, Dave Mitchell <davem@iabyn.com> wrote:
> On Wed, Feb 07, 2007 at 10:44:55AM +0100, demerphq wrote:
> > I mean heck, utf8 was a kudge worked out on a napkin to make it
> > possible to store unicode filenames in a unix style filesystem. (utf8
> > has the property that no encoding of a high codepoint contains any
> > special character used by a unix filesystem)
>
> That's a bit of a misrepresentation!
>
> UTF8 has very little to do with storing UNIX filenames, and everything
> to do with working under traditional C string handling, where a zero byte
> is the string terminator, and that degrades gracefully in an environment
> that is not Unicode aware and where the majority of the characters are
> ASCII or Latin-1.

Well, I was referring to this:

http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt

And it was a placemat, not a napkin. My apologies. :-)

yves

<quote>
The challenges of the programming languages and the UCS standard are
being dealt with by other activities in the industry.  However, we are
still faced with the handling of UCS by historical operating systems
and utilities.  Prominent among the operating system UCS handling
concerns is the representation of the data within the file system.  An
underlying assumption is that there is an absolute requirement to
maintain the existing operating system software investment while at
the same time taking advantage of the use the large number of
characters provided by the UCS.

UCS provides the capability to encode multi-lingual text within a
single coded character set.  However, UCS and its UTF variant do not
protect null bytes and/or the ASCII slash ("/") making these character
encodings incompatible with existing Unix implementations.  The
following proposal provides a Unix compatible transformation format of
UCS such that Unix systems can support multi-lingual text in a single
encoding.  This transformation format encoding is intended to be used
as a file code.  This transformation format encoding of UCS is
intended as an intermediate step towards full UCS support.  However,
since nearly all Unix implementations face the same obstacles in
supporting UCS, this proposal is intended to provide a common and
compatible encoding during this transition stage.
</quote>


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About