develooper Front page | perl.perl6.internals | Postings from June 2001

Re: Should we care much about this Unicode-ish criticism?

Thread Previous | Thread Next
From:
Russ Allbery
Date:
June 5, 2001 19:14
Subject:
Re: Should we care much about this Unicode-ish criticism?
Message ID:
ylwv6qwe1b.fsf@windlord.stanford.edu
Larry Wall <larry@wall.org> writes:
> Russ Allbery writes:

>> Particularly since extending UTF-8 to more than 31 bits requires
>> breaking some of the guarantees that UTF-8 makes, unless I'm missing
>> how you're encoding the first byte so as not to give it a value of
>> 0xFE.

> The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illegal UTF-8 in
> any case, so it doesn't much matter, assuming BOMs are used on UTF-16
> that has to be auto-distinguished from UTF-8.  (Doing any kind of
> auto-recognition on 16-bit data without BOMs is problematic in any
> case.)

Yeah, but one of the guarantees of UTF-8 is:

   -  The octet values FE and FF never appear.

I can see that this property may not be that important, but it makes me
feel like things that don't have this property aren't really UTF-8.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About