On Thu, Aug 18, 2011 at 5:35 PM, Johan Vromans <jvromans@squirrel.nl> wrote: > We came a long way, from ASCII via 'Extended' ASCII to Unicode. In the > Unicode world, one can no longer process a text file without knowing > what the encoding is. (Actually, this was true for Extended ASCII as > well.) A BOM helps identify some of the possible encodings. However, our > current IO systems are still equipped for byte operations only. Okay, we > can specify an encoding using a PerlIO layer, but that's only part of > the job. What we need is an augmented IO system that can handle BOMs. The word «some» is exactly why this is not a particularly good idea. Not because you can't recognize UTF-8 this way, but because you can't differentiate legacy character sets. The absence of a BOM won't tell you if it's latin1 or KOI-R or anything else. *If you have to make such assumptions, you're screwed anyway*. > use open IN => ':encoding(auto)' OUT => ':encoding(UTF-16LE+BOM)'; The former is currently not implementable in any sane way on PerlIO. LeonThread Previous | Thread Next