develooper Front page | perl.perl5.porters | Postings from January 2005

Re: [perl #33734] unpack fails on utf-8 strings

Thread Previous
From:
Nick Ing-Simmons
Date:
January 10, 2005 06:34
Subject:
Re: [perl #33734] unpack fails on utf-8 strings
Message ID:
20050110143414.6982.2@llama.elixent.com
Nicholas Clark <nick@ccl4.org> writes:
>On Sun, Jan 09, 2005 at 10:19:43PM -0000, Marc Lehmann wrote:
>
>> As the internal encoding (wether latin1 or utf8) does NOT change the
>> string on the perl level, unpack must work consistently.
>
>I agree. Well, I thought I did. Then..

I know I agree ;-)
But I was overruled.

>
>> (I found this bug because for some reason perl upgraded my string to
>> utf-8 internally, causing very funny effects when I ran various unpacks
>> to decode the protocol. As perl can do that in various unexpected ways,
>> I chose severity "high" because there is no easy workaround on the perl
>> level: feel free to correct this :)
>> 
>> The solution is to downgrade the string to latin1 before converting it
>> within unpack, or failing if the string cnanot be converted.
>
>However, I'm confused. There is this code in pp_pack.c:
>
>#ifdef PACKED_IS_OCTETS
>    /* Packed side is assumed to be octets - so force downgrade if it
>       has been UTF-8 encoded by accident
>     */
>    register char *s = SvPVbyte(right, rlen);
>#else
>    register char *s = SvPV(right, rlen);
>#endif
>
>and the default is the #else clause. If I recompile with -DPACKED_IS_OCTETS

I am reasonably sure that was my code, and the #ifdef backs it out
for obscure 5.6 compatibility and/or Camel-III promissed reasons.

>
>  Failed 40 test scripts out of 903, 95.57% okay.
>
>which doesn't look great. It looks like some cases in unpack expect to find
>utf8 data in the source string. Great. :-(
>I wonder if it's viable to make the integer conversion operators (and the
>floating point operators) downgrade just enough characters to be useful?

Then snag is that with 5.6 using pack/unpack and messing with SvUTF8 flag 
by obscure means was only way to do 'encoding'.



Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About