Nick wrote: >>> (WRONG in the general case. It feels like an awful lot of end-user >>> code to deal with encodings is heuristics and bodgery, rather than >>> actual understanding) >> Very true, and a source of perpetual annoyance. But it's a separate >> issue, isn't it? > Not in my mind. Finding the need to resort to flipping the internal > flag for UTF-8 is a red flag that the proper conversion layer isn't > implemented, because the flow of data hasn't been thought about. It does leave a code-smell, doesn't it? I've always been uncomfy with it, but I don't know what else to do. Could you please tell me how I *should* then be writing the unless test and block at the bottom of this code snippet: for my $codepoint ( $first_codepoint .. $last_codepoint ) { # gaggy UTF-16 surrogates are invalid UTF-8 code points next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF; # from utf8.c in perl src; must avoid fatals in 5.10 next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF; # both FFFE and FFFF are "not characters" in any plane next if 0xFFFE == ($codepoint & 0xFFFE); # see "Unicode non-character %s is illegal for interchange" in perldiag(1) $_ = do { no warnings "utf8"; chr($codepoint) }; # fixes "the Unicode bug" unless (utf8::is_utf8($_)) { $_ = decode("iso-8859-1", $_); } Especially given that this code must run on 5.10 and better, not just blead, I don't know how else to do it. "Should" I be calling pack("U", $codepoint) or something? thanks, --tomThread Previous | Thread Next