-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Moin, On Saturday 31 March 2007 16:09:18 Juerd Waalboer wrote: > Tels skribis 2007-03-31 12:23 (+0000): > > #!/usr/bin/perl -w > > use Encode qw/decode/; > > my $random = "\xc3\xc3"; # some random bytes > > my $ascii = "a"; # some 7bit data > > > > # Somebody "helpfull" decodes the ascii string: > > # The encoding doesn't actually matter, since it is 7bit anyway. > > # This step happens out of my control (e.g. in third party code) > > $string = decode('ISO-8859-1', $ascii); > > $string is a text string, now. Remember, decoding is going from byte > string to text string. Yes, but my point was that I: * might not be the one who "decoded" $string or produced it even. * do not know if I am passed a "text" string as there is only the flag-you-should-not-know-about to distinguish these two. > Using unpack "C" on a text string makes no sense if you consider that > this "C" doesn't stand for "character" in the sense that the > documentation for chr, ord, length, split, etcetera use. It stands for > "char", which is a C datatype that contains one byte. > > As such, unpack "C" is a byte operation and makes sense on byte strings > only. $string is a text string, and you can tell by looking at the > decode() step. > > > # now take our random binary data and a 7bit ascii string and do: > > print join (" ", unpack("CCC", "$random$string")), "\n"; > > Dangerous, and that's why I suggested adding a "wide character in..." > warning earlier in this thread. > > > Now explain to me why this prints different things even tho $random is > > the same string in both cases, and $string and $ascii should be the > > same, too. :) Bonus points if you manage to not mention the uhh -- ut - > > utf -- uhm -- er The Flag[tm]. > > I get the bonus points! Hurrah! :) Not really, as you didn't explain the difference, you merely told me "there is a difference" (where me personally don't expect to be a difference) > The only explanation that I used is the separation between text strings > and binary strings. It's also the only thing you need to know. You'll > benefit from knowing more, certainly, but I see red flags in your code. Ok, and how am I supposed know that in: sub dosomething { my $a = shift; } $a is a text string or a binary string? :) > > So far, I can see the ways to handle this are: > > (..) > > * never mix fire and water er dogs and cats er I mean text and bytes, > > and pray that every piece of code out there to adheres to this, too. > > Exactly. This is not a working strategy. > > I think the Pray and Hope[tm] strategy doesn't really work, tho. > > It doesn't always work, because people can't be trusted to do the right > thing, but it can always be fixed. Only if you consider your own code. But data is sometimes processed by other code (Perl itself, some module etc.). All the best, Tels - -- Signed on Sat Mar 31 18:33:51 2007 with key 0x93B84C15. Get one of my photo posters: http://bloodgate.com/posters PGP key on http://bloodgate.com/tels.asc or per email. "We're looking at a future where only the very largest companies will be able to implement software, and it will technically be illegal for other people to do so." -- Bruce Perens, 2004-01-23 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iQEVAwUBRg6qqXcLPEOTuEwVAQINCAf/QWq653liE6ZUnR5sUrO8YFVXU0Gi5s/m wm4teby4dypHRuyjKov7a2XeheRCZU+iYXnlNFk8Tioqd3ZOwlZC5uGbufX1QnpO H9lYRtDTG14BHH2D+QsMgSrPcAXwsnvSdlePAmy4m9TJ3xQTtzcPLTWt2p8tgiul URl0lgMHv7I9ASJusYwPa00YRFDexpdVuYpclTtnzzVPoGkuMxAKIDhhAuKp9uSl gWJXGiha9hvGEZOh2k6mGZ/bkstEMhp3vrqU1ccp11jfahsaAwvU9EVS7254t22R KqXh3Ca4/lMxs+2+1xW0j518Asq0sB/L6gkyGr0tHdFgQwX7S71yoA== =K82l -----END PGP SIGNATURE-----Thread Previous | Thread Next