develooper Front page | perl.perl6.language | Postings from May 2010

Re: URI replacement pseudocode

Thread Previous
From:
Mark J. Reed
Date:
May 17, 2010 12:29
Subject:
Re: URI replacement pseudocode
Message ID:
AANLkTim5EDvADNlKO68GRD7t60CH7O4U7x27BZFw482c@mail.gmail.com
On Mon, May 17, 2010 at 3:00 PM, Aaron Sherman <ajs@ajs.com> wrote:
> FFFE and FEFF are used to manage byte-ordering, so they really shouldn't be
> part of a URI (URIs should exist in a context in which byte ordering is
> assured, would be my take).

Neither U+FFFE nor U+FFFF is a valid character, but  U+FEFF is
perfectly cromulent, if deprecated: it's the ZERO-WIDTH NON-BREAKING
SPACE (U+200C ZERO WIDTH NON-JOINER is the modern replacement).   The
choice of byte-order mark protocol was well-considered: if U+FEFFis
interpreted as a character instead of a BOM, it's a pretty harmless
character.

> The Unicode spec says that FFFF is guaranteed not to be a valid Unicode
> character, but does not explain why. [
> http://unicode.org/charts/PDF/UFFF0.pdf]

The Unicode specification is a lot more than code charts.  See section
15.8, "Noncharacters", for discussion of these code points.  FFFF (and
U+xFFFF for all valid values of x up through 0x10) are invalid so they
can be used as sentinel values within application memory, for
instance.  Whereas U+FFFE is illegal precisely because it's the
inverse of the BOM.

-- 
Mark J. Reed <markjreed@gmail.com>

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About