Paul Marquess <Paul_Marquess@yahoo.co.uk> writes: >From: Nick Ing-Simmons [mailto:nik@tiuk.ti.com] > >... >> What I didn't tell you is utf8_length() is to be defined thus: >> >> MODULE = Encode, PACKAGE = Encode, PREFIX = Nicks_ >> >> IV >> Nicks_utf8_length(sv) >> SV * sv >> CODE: >> { >> STRLEN len; >> (void) SvPV_force(sv,len); // its a string damn it! >> sv_utf8_upgrade(sv); // I want it encoded >> SvUTF8_on(sv); // even if there are no high-bits yet, >> RETVAL = SvCUR(sv); // now how long is it ... >> } >> OUTPUT: >> RETVAL > >Will there be any impact on legacy .xs code that doesn't know anything about >the new UTF8 SV's? I'm thinking here of modules like Compresss::Zlib. XS code is of necessity more fragile than perl code in this regard. > >Hmmm, does this fall under the "if an XS sub expects bytes, it's the >responsibility of the person using the code to give them bytes". SvPV etc. are still there and give you the raw data, the problem is that it can in general be in either form. But (to be safe) the XS code needs to sv_uft8_downgrade() if the SvUTF8 flag is set - otherwise it is as fragile as 'use bytes'. We could make the XS visible versions of SvPV() etc. do that automatically I guess, or we could the typemaps do that. -- Nick Ing-Simmons <nik@tiuk.ti.com> Via, but not speaking for: Texas Instruments Ltd.