Simon Cozens <simon@netthink.co.uk> writes: >On Tue, Feb 20, 2001 at 11:43:30AM +0000, Nick Ing-Simmons wrote: >> No - you get the wrong answer. Consider a string which happens to be UTF-8 >> encoded at time you do bytes::length - but which gets auto-downgraded >> when you do the print > >Yeah, well, that wasn't my idea either... :) > >> , so you need >> >> { use bytes; print ... } >> >> as well. > >Or you set the output layer to UTF8, surely? No, 'cos that will not match byte::length if string was _NOT_ encoded. >> bytes is (near) useless. > >So we replace it with your utf8_length, which fills exactly the same >gap. Huh. Not convinced. What I didn't tell you is utf8_length() is to be defined thus: MODULE = Encode, PACKAGE = Encode, PREFIX = Nicks_ IV Nicks_utf8_length(sv) SV * sv CODE: { STRLEN len; (void) SvPV_force(sv,len); // its a string damn it! sv_utf8_upgrade(sv); // I want it encoded SvUTF8_on(sv); // even if there are no high-bits yet, RETVAL = SvCUR(sv); // now how long is it ... } OUTPUT: RETVAL ;-) > >> But that does NOT mean that anything in bleadperl is broken. > >You're right! > >> Just that cutting a hole in ones abdomen and peering at ones guts >> is likely to hurt and not do you much good. > >Well, hmm, Don't Do That Then. :) I never do, nor do I encourage other people to do so by leaving scalpels lying about. -- Nick Ing-Simmons <nik@tiuk.ti.com> Via, but not speaking for: Texas Instruments Ltd.