develooper Front page | perl.perl5.porters | Postings from February 2001

RE: The State of The Unicode

Nick Ing-Simmons
February 20, 2001 07:14
RE: The State of The Unicode
Message ID:
Paul Marquess <> writes:
>From: Nick Ing-Simmons []
>> What I didn't tell you is utf8_length() is to be defined thus:
>> MODULE = Encode, PACKAGE = Encode, PREFIX = Nicks_
>> IV
>> Nicks_utf8_length(sv)
>> SV *	sv
>> CODE:
>> {
>>  STRLEN len;
>>  (void) SvPV_force(sv,len);  // its a string damn it!
>>  sv_utf8_upgrade(sv);        // I want it encoded
>>  SvUTF8_on(sv);              // even if there are no high-bits yet,
>>  RETVAL = SvCUR(sv);         // now how long is it ...
>> }
>Will there be any impact on legacy .xs code that doesn't know anything about
>the new UTF8 SV's? I'm thinking here of modules like Compresss::Zlib.

XS code is of necessity more fragile than perl code in this regard.

>Hmmm, does this fall under the "if an XS sub expects bytes, it's the
>responsibility of the person using the code to give them bytes".

SvPV etc. are still there and give you the raw data, the problem is 
that it can in general be in either form.

But (to be safe) the XS code needs to sv_uft8_downgrade() if the SvUTF8
flag is set - otherwise it is as fragile as 'use bytes'.

We could make the XS visible versions of SvPV() etc. do that automatically
I guess, or we could the typemaps do that. 

Nick Ing-Simmons <>
Via, but not speaking for: Texas Instruments Ltd. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About