On Feb 6, 2007, at 7:02 PM, mark@mark.mielke.cc wrote: > On Wed, Feb 07, 2007 at 01:56:09AM +0100, Gerard Goossen wrote: >> I would suggest to make the UTF-EBCDIC the representation in Perl7 on >> EBCDIC platforms, regardless of what is in the string. > > Why? Only performance? Why is UTF-EBCDIC not frequently used any > longer, > and why should Perl buck that trend? Two of my XS libraries speak UTF-8. In theory, I believe I am supposed to treat these libraries as "the outside world", wrapping every XS call in Encode::encode and Encode::decode, because Perl's internal encoding is officially opaque. And indeed, these libraries should fail on EBCDIC systems. Just as they would fail if Perl's internal encoding were switched to bit-complemented UTF-8 as I recall hearing discussed a while back. However, all that encode/decode overhead would kill the performance of these libraries, rendering them far less useful. It would be nice it Perl's internal encoding was always, officially UTF-8 -- then there wouldn't be a conflict. But I imagine that might be very hard to pull off on EBCDIC systems, so maybe it's better this way -- I get to choose not to support EBCDIC systems (along with systems that don't use IEEE 754 floats, and systems where chars are bigger than a byte). >> I don't care whether $string is a text-string or byte-string, I >> just want >> it to returns the same string. > > Perhaps you should care. In a language such as Java, you are forced to > care, as byte[] and String are different types. Perl blurs this > difference, > and lets you believe that you should not need to care. I agree, Mark. Silent upgrading of bytes to Unicode strings cost me a bunch of debugging time when I learned the hard way that you need to care. I was writing a serializer that concatenated Unicode strings together with packed integers to make sort keys. It never occurred to me that such a concat operation would corrupt the packed integer, and it took me a long time to hunt down why my sort op was failing. I've since been taught to avoid mixing different kinds of scalars, but I would have been better off if Perl itself had taught me that, by throwing an error when I tried that operation. I wish that Perl had at least two different kinds of scalars, each with its own vtable for dispatching such behaviors... struct SV { SV_VTABLE *_; U32 refcnt; }; struct SVIV { SVIV_VTABLE *_; U32 refcnt; IV iv }; struct SVPVbyte { SVPVbyte_VTABLE *_; U32 refcnt; char *pv STRLEN cur; STRLEN len; }; struct SVPVutf8 { SVPVutf8_VTABLE *_; U32 refcnt; char *pv STRLEN cur; STRLEN len; }; ... in a perfect world, that is. Marvin Humphrey Rectangular Research http://www.rectangular.com/