develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
Tom Christiansen
August 18, 2011 05:13
Re: BOMs as noncharacters
Message ID:
Karl Williamson <> wrote
   on Wed, 17 Aug 2011 22:00:38 MDT: 

> It may be my turn to be mistaken.  I don't see anything like that in the 
> current Standard; perhaps I got the impression that they were frowned 
> upon by off-hand remarks in the Unicode mailing list; or perhaps I 
> dreamt it all up.

They are certainly discouraged in UTF-8 streams, where they not only 
serve no purpose but also interfere with catenating streams together
in a chain:

    cat file1.utf8 file2.utf8 file3.utf8 > all.utf8

*only* works correctly when those files have no out-of-band metadata
BOMs at their fronts, with the possible exception of the first.

Confusion of metadata BOMs for data changes the entire length of the
string.  If each file has 10 characters (not counting BOMs), then the 
final file *must* have 30 characters (not counting BOMs).  It's a 
simple matter of arithmetic.

This is the same glaring flaw that occurs when Microsoft people
create a malformed text file that doesn't end in a newline.

    cat file1.txt file2.txt file3.txt > all.txt

If the first three files hold 10 lines apiece, then the final file *must*
hold 30 lines.  However, if either or both of the first two files have been
negligently shorted their final newline, this is completely screwed up, and
you accidentally create a single line in the output where there had been
two of them in the input, and your output's line count no longer
corresponds to that of your input.

This is stupid.  That's why you should always put a newline at the end of
every text file, and why you should never put a BOM at the start of (nor
anywhere in) a UTF-8 file.  Sloppy Microsoft people tend to be guilty of
both sins and often simultaneously, thereby needlessly making all of our
lives more difficult.  Just say no.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About