On Sat, Feb 25, 2012 at 11:28 PM, Tom Christiansen <tchrist@perl.com> wrote: > Some folks claim the only "safe" way to use Unicode in Perl is to always make > explicit calls to encode/decode with a bonus FB_CROAK argument. They claim > that all nine of these perfectly reasonable and common-to-the-99th-percentile > operations... > > #1. $ perl -C... > #2. $ export PERL_UNICODE=... > > #3. use utf8; > > #4. use open qw[ :std :utf8 ]; > #5. use open qw[ :std :encoding(UTF-8) ]; > > #6. binmode(FH, ":utf8"); > #7. binmode(FH, ":encoding(UTF-8)"); > > #8. open(FH, "< :utf8", $path); > #9. open(FH, "< :encoding(UTF-8)", $path); > > ...are all of them flawed in their not raising exceptions on UTF-8 > encoding errors of one sort of another, and that somehow not even... > > #0. use warnings qw(FATAL utf8); > > ...is good enough to fix it. > > I do not know whether these claims are true. My own tests suggest this may > not be the whole story, because this behaves as I think it should: > > darwin$ perl -C0 -E 'say for "caf\xE9", "stuff"' | > perl -CS -Mwarnings=FATAL,utf8 -pe 'print "$. "' > utf8 "\xE9" does not map to Unicode, <> line 1. > Exit 255 I'm having the impression that only high-level readline (e.g. not what the parser uses) actually checks input for invalid characters. Most other operations only seem to check for well-formedness if they check at all. I may be mistaken though: I haven't tested tested this, just read source. LeonThread Previous | Thread Next