develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Marvin Humphrey
Date:
February 6, 2007 20:38
Subject:
Re: Future Perl development
Message ID:
3E8C4495-F29B-49EF-9335-B2F23C03D1DC@rectangular.com

On Feb 6, 2007, at 7:02 PM, mark@mark.mielke.cc wrote:

> On Wed, Feb 07, 2007 at 01:56:09AM +0100, Gerard Goossen wrote:
>> I would suggest to make the UTF-EBCDIC the representation in Perl7 on
>> EBCDIC platforms, regardless of what is in the string.
>
> Why? Only performance? Why is UTF-EBCDIC not frequently used any  
> longer,
> and why should Perl buck that trend?

Two of my XS libraries speak UTF-8.  In theory, I believe I am  
supposed to treat these libraries as "the outside world", wrapping  
every XS call in Encode::encode and Encode::decode, because Perl's  
internal encoding is officially opaque.  And indeed, these libraries  
should fail on EBCDIC systems.  Just as they would fail if Perl's  
internal encoding were switched to bit-complemented UTF-8 as I recall  
hearing discussed a while back.

However, all that encode/decode overhead would kill the performance  
of these libraries, rendering them far less useful.  It would be nice  
it Perl's internal encoding was always, officially UTF-8 -- then  
there wouldn't be a conflict.  But I imagine that might be very hard  
to pull off on EBCDIC systems, so maybe it's better this way -- I get  
to choose not to support EBCDIC systems (along with systems that  
don't use IEEE 754 floats, and systems where chars are bigger than a  
byte).

>> I don't care whether $string is a text-string or byte-string, I  
>> just want
>> it to returns the same string.
>
> Perhaps you should care. In a language such as Java, you are forced to
> care, as byte[] and String are different types. Perl blurs this  
> difference,
> and lets you believe that you should not need to care.

I agree, Mark.  Silent upgrading of bytes to Unicode strings cost me  
a bunch of debugging time when I learned the hard way that you need  
to care.  I was writing a serializer that concatenated Unicode  
strings together with packed integers to make sort keys.  It never  
occurred to me that such a concat operation would corrupt the packed  
integer, and it took me a long time to hunt down why my sort op was  
failing.

I've since been taught to avoid mixing different kinds of scalars,  
but I would have been better off if Perl itself had taught me that,  
by throwing an error when I tried that operation.

I wish that Perl had at least two different kinds of scalars, each  
with its own vtable for dispatching such behaviors...

struct SV {
     SV_VTABLE *_;
     U32     refcnt;
};

struct SVIV {
     SVIV_VTABLE *_;
     U32     refcnt;
     IV      iv
};

struct SVPVbyte {
     SVPVbyte_VTABLE *_;
     U32     refcnt;
     char   *pv
     STRLEN  cur;
     STRLEN  len;
};

struct SVPVutf8 {
     SVPVutf8_VTABLE *_;
     U32     refcnt;
     char   *pv
     STRLEN  cur;
     STRLEN  len;
};

... in a perfect world, that is.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About