Tels skribis 2007-03-31 12:23 (+0000): > #!/usr/bin/perl -w > use Encode qw/decode/; > my $random = "\xc3\xc3"; # some random bytes > my $ascii = "a"; # some 7bit data > > # Somebody "helpfull" decodes the ascii string: > # The encoding doesn't actually matter, since it is 7bit anyway. > # This step happens out of my control (e.g. in third party code) > $string = decode('ISO-8859-1', $ascii); $string is a text string, now. Remember, decoding is going from byte string to text string. Using unpack "C" on a text string makes no sense if you consider that this "C" doesn't stand for "character" in the sense that the documentation for chr, ord, length, split, etcetera use. It stands for "char", which is a C datatype that contains one byte. As such, unpack "C" is a byte operation and makes sense on byte strings only. $string is a text string, and you can tell by looking at the decode() step. > # now take our random binary data and a 7bit ascii string and do: > print join (" ", unpack("CCC", "$random$string")), "\n"; Dangerous, and that's why I suggested adding a "wide character in..." warning earlier in this thread. > Now explain to me why this prints different things even tho $random is the > same string in both cases, and $string and $ascii should be the same, > too. :) Bonus points if you manage to not mention the uhh -- ut - utf -- > uhm -- er The Flag[tm]. I get the bonus points! Hurrah! :) The only explanation that I used is the separation between text strings and binary strings. It's also the only thing you need to know. You'll benefit from knowing more, certainly, but I see red flags in your code. > So far, I can see the ways to handle this are: > (..) > * never mix fire and water er dogs and cats er I mean text and bytes, and > pray that every piece of code out there to adheres to this, too. Exactly. > I think the Pray and Hope[tm] strategy doesn't really work, tho. It doesn't always work, because people can't be trusted to do the right thing, but it can always be fixed. -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.Thread Previous | Thread Next