develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Thread Previous | Thread Next
From:
Nick Ing-Simmons
Date:
February 20, 2001 05:47
Subject:
Re: The State of The Unicode
Message ID:
200102201347.NAA23781@mikado.tiuk.ti.com
Simon Cozens <simon@netthink.co.uk> writes:
>On Tue, Feb 20, 2001 at 11:43:30AM +0000, Nick Ing-Simmons wrote:
>> No - you get the wrong answer. Consider a string which happens to be UTF-8
>> encoded at time you do bytes::length - but which gets auto-downgraded 
>> when you do the print
>
>Yeah, well, that wasn't my idea either... :)
>
>> , so you need
>> 
>> { use bytes; print ... } 
>> 
>> as well.
>
>Or you set the output layer to UTF8, surely? 

No, 'cos that will not match byte::length if string was _NOT_ encoded.

>> bytes is (near) useless.
>
>So we replace it with your utf8_length, which fills exactly the same
>gap. Huh. Not convinced.

What I didn't tell you is utf8_length() is to be defined thus:

MODULE = Encode, PACKAGE = Encode, PREFIX = Nicks_

IV
Nicks_utf8_length(sv)
SV *	sv
CODE:
{
 STRLEN len;
 (void) SvPV_force(sv,len);  // its a string damn it!
 sv_utf8_upgrade(sv);        // I want it encoded 
 SvUTF8_on(sv);              // even if there are no high-bits yet, 
 RETVAL = SvCUR(sv);         // now how long is it ...
}
OUTPUT:
 RETVAL

;-)


>
>> But that does NOT mean that anything in bleadperl is broken.
> 
>You're right!
>
>> Just that cutting a hole in ones abdomen and peering at ones guts
>> is likely to hurt and not do you much good.
> 
>Well, hmm, Don't Do That Then. :)

I never do, nor do I encourage other people to do so by leaving 
scalpels lying about.

-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About