develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Nick Ing-Simmons
February 20, 2001 05:47
Re: The State of The Unicode
Message ID:
Simon Cozens <> writes:
>On Tue, Feb 20, 2001 at 11:43:30AM +0000, Nick Ing-Simmons wrote:
>> No - you get the wrong answer. Consider a string which happens to be UTF-8
>> encoded at time you do bytes::length - but which gets auto-downgraded 
>> when you do the print
>Yeah, well, that wasn't my idea either... :)
>> , so you need
>> { use bytes; print ... } 
>> as well.
>Or you set the output layer to UTF8, surely? 

No, 'cos that will not match byte::length if string was _NOT_ encoded.

>> bytes is (near) useless.
>So we replace it with your utf8_length, which fills exactly the same
>gap. Huh. Not convinced.

What I didn't tell you is utf8_length() is to be defined thus:

MODULE = Encode, PACKAGE = Encode, PREFIX = Nicks_

SV *	sv
 STRLEN len;
 (void) SvPV_force(sv,len);  // its a string damn it!
 sv_utf8_upgrade(sv);        // I want it encoded 
 SvUTF8_on(sv);              // even if there are no high-bits yet, 
 RETVAL = SvCUR(sv);         // now how long is it ...


>> But that does NOT mean that anything in bleadperl is broken.
>You're right!
>> Just that cutting a hole in ones abdomen and peering at ones guts
>> is likely to hurt and not do you much good.
>Well, hmm, Don't Do That Then. :)

I never do, nor do I encourage other people to do so by leaving 
scalpels lying about.

Nick Ing-Simmons <>
Via, but not speaking for: Texas Instruments Ltd. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About