develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
August 18, 2011 17:18
Subject:
Re: BOMs as noncharacters
Message ID:
30581.1313713082@chthon

>> Karl Williamson <public@khwilliamson.com> wrote
>>   on Wed, 17 Aug 2011 22:00:38 MDT:
>>
>> > It may be my turn to be mistaken.  I don't see anything like that in the
>> > current Standard; perhaps I got the impression that they were frowned
>> > upon by off-hand remarks in the Unicode mailing list; or perhaps I
>> > dreamt it all up.
>>
>> They are certainly discouraged in UTF-8 streams, where they not only
>> serve no purpose but also interfere with catenating streams together
>> in a chain:
>>
>>    cat file1.utf8 file2.utf8 file3.utf8 > all.utf8
>>
>> *only* works correctly when those files have no out-of-band metadata
>> BOMs at their fronts, with the possible exception of the first.
>>

>I don't see why you think it wouldn't work with BOMs. You'll end up with a
>file with a BOM at the front and two ZERO WIDTH NO-BREAK SPACE (U+FEFF) in
>the middle.

Because you've change the length of hte file.  If I have 3 files
each with 10 characters, I require that the catenation of those
3 files produce a file with 30 characters.  BOMs are metadata
masquerading as data.  

Do not change the length of my strings by adding data that I 
didn't put there.  That is just wrong.

If cat is the wrong tool for working on text files, then you have
a really wrong idea of a text file.

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About