Tels skribis 2007-03-31 0:19 (+0000): > Anyway, I wasn't aware that any non-utf8 data in Perl is *always* > ISO-8859-1, I thought that, when not specified, this depended on some other > stuff. Guess I need to reread the tutorials. :) Note that they are unicode strings, and that Perl is theoretically free to change the internal representation at any time. > However, this also poses the question: How does Perl know that your data is > in KOI8-R? Because you tell it that it is with "decode". The resulting string is a unicode string, which may have any encoding internally. (Practically, this is limited to latin1 and utf8.) my $text_string = decode("koi8-r", $byte_string); or, if you prefer different terminology: my $unicode_string = decode("koi8-r", $koi8r_string); > One of the limitations of the "there can be only two encodings" of Perl > seems to be that strings are permanently upgraded: > $iso_8859_1 = '...'; > $utf8 = '...'; > if ($iso_8859_1 eq $utf8) { ... } $iso_8859_1 is temporarily upgraded to utf8 for this comparison. (Yes, this copies data, and then throws it away. Again, optimization does require knowing internals. The easiest optimization here is to utf8::upgrade $iso_8859_1, after which the variable name no longer makes sense :)) > Just like 1 + 2.0 will result in 3.0 and not 3 and we all know how > much confusion this creates :) (heh, I fell for it today, even tho I > should have know better :) Doesn't really cause me any headaches, to be honest. > > The same type of string can be used for binary data, because in the > > unicode encoding "latin1", all 256 codepoints map to the same byte > > values. > This sounds like a circular definition, because in CP1250, also all 256 > codepoints map to the same byte values. Except it are different byte > values :) I said "unicode encoding", but should have said "unicode codepoints". Codepoints 0..256 in latin1 map to byte values 0..256. That makes it special. > > > In short, it becomes a mess. > > Yes, with strong typing, especially with string subtypes for arbitrary > > encodings, it would be cleaner. But it would also not look like Perl 5. > Over the years, I come to the insight that I want to build reliable and fast > programs. (easy to maintain, reliable, fast, pick two :-) I do that with Perl. Really, you should check that language out! You'll LOVE it! :) -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.Thread Previous | Thread Next