develooper Front page | perl.perl5.porters | Postings from January 2005

Re: [perl #33734] unpack fails on utf-8 strings

Thread Previous
Nick Ing-Simmons
January 10, 2005 06:34
Re: [perl #33734] unpack fails on utf-8 strings
Message ID:
Nicholas Clark <> writes:
>On Sun, Jan 09, 2005 at 10:19:43PM -0000, Marc Lehmann wrote:
>> As the internal encoding (wether latin1 or utf8) does NOT change the
>> string on the perl level, unpack must work consistently.
>I agree. Well, I thought I did. Then..

I know I agree ;-)
But I was overruled.

>> (I found this bug because for some reason perl upgraded my string to
>> utf-8 internally, causing very funny effects when I ran various unpacks
>> to decode the protocol. As perl can do that in various unexpected ways,
>> I chose severity "high" because there is no easy workaround on the perl
>> level: feel free to correct this :)
>> The solution is to downgrade the string to latin1 before converting it
>> within unpack, or failing if the string cnanot be converted.
>However, I'm confused. There is this code in pp_pack.c:
>    /* Packed side is assumed to be octets - so force downgrade if it
>       has been UTF-8 encoded by accident
>     */
>    register char *s = SvPVbyte(right, rlen);
>    register char *s = SvPV(right, rlen);
>and the default is the #else clause. If I recompile with -DPACKED_IS_OCTETS

I am reasonably sure that was my code, and the #ifdef backs it out
for obscure 5.6 compatibility and/or Camel-III promissed reasons.

>  Failed 40 test scripts out of 903, 95.57% okay.
>which doesn't look great. It looks like some cases in unpack expect to find
>utf8 data in the source string. Great. :-(
>I wonder if it's viable to make the integer conversion operators (and the
>floating point operators) downgrade just enough characters to be useful?

Then snag is that with 5.6 using pack/unpack and messing with SvUTF8 flag 
by obscure means was only way to do 'encoding'.

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About