On Wed, Feb 07, 2007 at 01:56:09AM +0100, Gerard Goossen wrote: > I would suggest to make the UTF-EBCDIC the representation in Perl7 on > EBCDIC platforms, regardless of what is in the string. Why? Only performance? Why is UTF-EBCDIC not frequently used any longer, and why should Perl buck that trend? > UTF-EBCDIC is an encoding of UNICODE. But a strange one in the sence > that bytes do NOT correspond to codepoints for codepoints < 0x7F. But > the bytes do correspond. > If you want have codepoint U+0041 (ASCII 'A') this would be UTF-EBCDIC > encoded as 0xC1. Using EBCDIC encoding 0xC1 would also be an 'A'. > So although with the codepoints are not the same with UNICODE and > EBCDIC, using UTF-EBCDIC the bytes are. What remains is that EBCDIC, or UTF-EBCDIC, is not UNICODE. A second translation phase is required to convert EBCDIC/UNICODE to UNICODE. > Like you my initial my initial idea was also to use UTF-8 on EBCDIC > platforms, but SADAHIRO pointed out, that on EBCDIC platform '\n' in C > would not generate a LF in UTF-EBCDIC, but not in a LF in UTF-8. The '\n' vs '\r' is a large issue on its own that has little do with UNICODE. I believe Mac Perl has actually swapped '\n' and '\r'. I don't believe that the native end-of-line character is a reason to choose UTF-8 vs UTF-EBCDIC. > sub identity { > my $string = shift; > my ($sock1, $sock2); > socketpair($sock1, $sock2, AF_UNIX, SOCK_STREAM, PF_UNSPEC) or die; > $sock1->print($string); > $sock1->close; > local $/ = undef; # slurp mode > return $sock2->getline(); > } > > I don't care whether $string is a text-string or byte-string, I just want > it to returns the same string. Perhaps you should care. In a language such as Java, you are forced to care, as byte[] and String are different types. Perl blurs this difference, and lets you believe that you should not need to care. > The problem with the current Perl 5 is that it uses two encoding latin1 > and UTF-8. The above identity holds in Perl 5, for both text and byte string, > as long as you don't use any unicode characters, leading to people > avoiding unicode :-( This is more of the same confusion with Perl's implementation. It scares me too. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/Thread Previous | Thread Next