develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
From:
Eric Brine
Date:
August 18, 2011 15:24
Subject:
Re: BOMs as noncharacters
Message ID:
CALJW-qHx2Dvd1hS6kBX=jnujDWRdzuiuJTPVDmQ_YHfK1zFmng@mail.gmail.com
On Thu, Aug 18, 2011 at 8:12 AM, Tom Christiansen <tchrist@perl.com> wrote:

> Karl Williamson <public@khwilliamson.com> wrote
>   on Wed, 17 Aug 2011 22:00:38 MDT:
>
> > It may be my turn to be mistaken.  I don't see anything like that in the
> > current Standard; perhaps I got the impression that they were frowned
> > upon by off-hand remarks in the Unicode mailing list; or perhaps I
> > dreamt it all up.
>
> They are certainly discouraged in UTF-8 streams, where they not only
> serve no purpose but also interfere with catenating streams together
> in a chain:
>
>    cat file1.utf8 file2.utf8 file3.utf8 > all.utf8
>
> *only* works correctly when those files have no out-of-band metadata
> BOMs at their fronts, with the possible exception of the first.
>

I don't see why you think it wouldn't work with BOMs. You'll end up with a
file with a BOM at the front and two ZERO WIDTH NO-BREAK SPACE (U+FEFF) in
the middle.

It might be better if those ZWNBSP weren't there, but that's not the same as
not working.


> This is the same glaring flaw that occurs when Microsoft people
> create a malformed text file that doesn't end in a newline.
>

It's perfectly acceptable in Windows. If you fail to convert the file
properly when converting it to a unix text file, that's not Microsoft
people's problem.

cat file1.txt file2.txt file3.txt > all.txt
>
If the first three files hold 10 lines apiece, then the final file *must*
> hold 30 lines.


I agree. C<cat> is the wrong tool for working on Windows text files.

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About