On Thu, Aug 18, 2011 at 8:12 AM, Tom Christiansen <tchrist@perl.com> wrote: > Karl Williamson <public@khwilliamson.com> wrote > on Wed, 17 Aug 2011 22:00:38 MDT: > > > It may be my turn to be mistaken. I don't see anything like that in the > > current Standard; perhaps I got the impression that they were frowned > > upon by off-hand remarks in the Unicode mailing list; or perhaps I > > dreamt it all up. > > They are certainly discouraged in UTF-8 streams, where they not only > serve no purpose but also interfere with catenating streams together > in a chain: > > cat file1.utf8 file2.utf8 file3.utf8 > all.utf8 > > *only* works correctly when those files have no out-of-band metadata > BOMs at their fronts, with the possible exception of the first. > I don't see why you think it wouldn't work with BOMs. You'll end up with a file with a BOM at the front and two ZERO WIDTH NO-BREAK SPACE (U+FEFF) in the middle. It might be better if those ZWNBSP weren't there, but that's not the same as not working. > This is the same glaring flaw that occurs when Microsoft people > create a malformed text file that doesn't end in a newline. > It's perfectly acceptable in Windows. If you fail to convert the file properly when converting it to a unix text file, that's not Microsoft people's problem. cat file1.txt file2.txt file3.txt > all.txt > If the first three files hold 10 lines apiece, then the final file *must* > hold 30 lines. I agree. C<cat> is the wrong tool for working on Windows text files. - EricThread Previous | Thread Next