develooper Front page | perl.perl5.porters | Postings from February 2001

Re: Perl-Unicode fundamentals (was Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2))

Thread Previous | Thread Next
From:
nick
Date:
February 20, 2001 13:56
Subject:
Re: Perl-Unicode fundamentals (was Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2))
Message ID:
E14VKib-00059R-00@roam1
Ilya Zakharevich <ilya@math.ohio-state.edu> writes:
>> They are - the dispute (in so far as there is any dispute) is 
>> over the the internal representation a perl language construct
>> chooses.
>
>No way.  Jarkko says that "transparency" (the fact that only the
>"logical representation" matters outside of XS) is no more a goal for
>Perl.  

I don't think that is what Jarkko meant at all. 
Transparency may no longer be a goal - because we already have it!
(well almost... a bug or two remains).

>I say that Perl which does not achieve transparency is useless
>Perl (in presence of high chars - but since you never may be sure,
>almost always).
>
>The choice of internal representation may matter performance-wise, but
>this may be addressed with pragmas.  [BTW, if I correctly understood
>what Jarkko was insinuating, we have no choice now: if a string
>contains only logical chars <256, it is *forced* to bytes...  I hope
>I'm wrong...]

You are wrong. chop($str.chr(256)) leaves the result in UTF-8 form - 
I used it earlier to show the unpack('C',$str) != ord($str) "bug".

What _has_ happened is that the various number-to-char mechanisms
chr, pack, "\xHH", "\x{XX}" have been made more consistent.
Consistent starting point does not seem to me to be a disaster.

There is a lot of "fear uncertainty and dread" being spread. The perl model
is very largely is you (Ilya) proposed it. 

The big question mark is what we (well "they" actually) do on EBCDIC 
platforms where it has been demonstrated that ord('A') == 0xC1 is 
a requirement (if only because it is used as a test for "this is an EBCDIC 
platform").  Simon and Peter have made much progress in this area
but they have not fully explained it yet.

Personally I don't really care what they do there - so long as they can 
explain it to me sufficiently that I can make Encode and IO do the right 
thing for when such machines talk to the rest of the world.

>
>> Apart from pack's U & C this does not leak into the 
>> "logical" world.
>
>And there is no reason for them to leak too.  

I agree 100% - patch to follow when I get a chance.

>The operations they do
>have perfect sense in the logical world too.  Currently I see no need
>whatsoever to have not-transparent operations *at all*.  

Neither do I - you "converted" me on the 'phone way-back-when.

>But if judged
>*needed*, they should be accessible from a module, to have the core
>"clean".

Everything is supposed to be "transparent", we have the module, 
the masocists have their 'use bytes', let us just fix the bugs and docs
and release it. 

-- 
Nick Ing-Simmons


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About