develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

From:
Nick Ing-Simmons
Date:
February 20, 2001 03:44
Subject:
Re: The State of The Unicode
Message ID:
200102201143.LAA23471@mikado.tiuk.ti.com
Simon Cozens <simon@simon-cozens.org> writes:
>On Mon, Feb 19, 2001 at 07:41:26PM -0600, Jarkko Hietaniemi wrote:
>> On Mon, Feb 19, 2001 at 08:33:37PM -0500, Andrew Pimlott wrote:
>> > On Mon, Feb 19, 2001 at 07:14:07PM -0600, Jarkko Hietaniemi wrote:
>> > > Protocols: if all I know is that my output is 500 Unicode characters
>> > > long, how am I to print out Content-Length?
>> > 
>> > As I said to abigail, I would love a concrete explanation of what
>> > you have in mind.  In particular, what is your mechanism for
>> > ensuring that perl is representing $output as utf8?
>> 
>> Ahhh.  True, got me there.  I can't ensure that.
>
>*But* if you use bytes::length, it doesn't matter - you'll get the right
>answer whether or not $output is UTF8-encoded. 

No - you get the wrong answer. Consider a string which happens to be UTF-8
encoded at time you do bytes::length - but which gets auto-downgraded 
when you do the print, so you need

{ use bytes; print ... } 

as well.

And that still fails if you are printing to a handle which has 
(say) base64  layer pushed.

bytes is (near) useless.

But that does NOT mean that anything in bleadperl is broken.

Just that cutting a hole in ones abdomen and peering at ones guts
is likely to hurt and not do you much good.

>Core's length, on the other
>hand, has to use character semantics because of the principle of least
>surprise, amongst other things. Hence the two have to exist, hence use bytes,
>QED. Thanks, Andrew. :)
-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About