develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
Eric Brine
August 18, 2011 15:24
Re: BOMs as noncharacters
Message ID:
On Thu, Aug 18, 2011 at 8:12 AM, Tom Christiansen <> wrote:

> Karl Williamson <> wrote
>   on Wed, 17 Aug 2011 22:00:38 MDT:
> > It may be my turn to be mistaken.  I don't see anything like that in the
> > current Standard; perhaps I got the impression that they were frowned
> > upon by off-hand remarks in the Unicode mailing list; or perhaps I
> > dreamt it all up.
> They are certainly discouraged in UTF-8 streams, where they not only
> serve no purpose but also interfere with catenating streams together
> in a chain:
>    cat file1.utf8 file2.utf8 file3.utf8 > all.utf8
> *only* works correctly when those files have no out-of-band metadata
> BOMs at their fronts, with the possible exception of the first.

I don't see why you think it wouldn't work with BOMs. You'll end up with a
file with a BOM at the front and two ZERO WIDTH NO-BREAK SPACE (U+FEFF) in
the middle.

It might be better if those ZWNBSP weren't there, but that's not the same as
not working.

> This is the same glaring flaw that occurs when Microsoft people
> create a malformed text file that doesn't end in a newline.

It's perfectly acceptable in Windows. If you fail to convert the file
properly when converting it to a unix text file, that's not Microsoft
people's problem.

cat file1.txt file2.txt file3.txt > all.txt
If the first three files hold 10 lines apiece, then the final file *must*
> hold 30 lines.

I agree. C<cat> is the wrong tool for working on Windows text files.

- Eric

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About