develooper Front page | perl.perl5.porters | Postings from May 2013

Re: How on earth did we manage to break pack() so badly?

Thread Previous
From:
demerphq
Date:
May 1, 2013 14:57
Subject:
Re: How on earth did we manage to break pack() so badly?
Message ID:
CANgJU+UUT-GGiPt9iNLOxMP23cgSuMfupa_9giPJHTW7fXr=ig@mail.gmail.com
On 1 May 2013 16:46, Nicholas Clark <nick@ccl4.org> wrote:
> On Wed, May 01, 2013 at 04:32:07PM +0200, demerphq wrote:
>> It used to be nice and safe to do this:
>>
>> print unpack("H*", $_),"\n"; # lets see what the string looks like in the raw.
>>
>>
>> This is no longer an effective debugging technique. It will NOT tell
>> you what your string looks like. It takes a "daddy knows best"
>> attitude and tries to do the right thing depending on whether the data
>> is utf8 or the data is not. Which means that this:
>>
>> perl -le'unpack "H*", "\x{DF}\x{100}"'
>>
>> Produces completely different results depending on which Perl you are
>> on. On older perls it produces a relatively useful:
>>
>> c39fc480
>
> Add U0:
>
> $ ./perl -le'print unpack "U0H*", "\x{DF}\x{100}"'
> c39fc480
>
> $ perl5.8.9 -le'print unpack "U0H*", "\x{DF}\x{100}"'
> c39fc480

Ah thanks. That fixes the hex part. Trying it on the "v/a" part
produces a corrupted utf8-on string:

$ perl -MDevel::Peek -wle'my $a= "a" x 129; utf8::upgrade($a); print(
my $msg= pack("U0v/a*", $a)); Dump($msg);' | hexdump -C
Wide character in print at -e line 1.
SV = PV(0x5f75150) at 0x5f8b3f0
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x5f9b940
"\201\0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"\0Malformed
UTF-8 character (unexpected continuation byte 0x81, with no preceding
start byte) in subroutine entry at -e line 1.
 [UTF8 "\x{0}\x{0}aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]
  CUR = 131
  LEN = 136
00000000  81 00 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |..aaaaaaaaaaaaaa|
00000010  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
*
00000080  61 61 61 0a                                       |aaa.|
00000084

Trying it with "U0v/U0a" silences the warning, but produces incorrect
(and arguably broken) output:

$ perl -MDevel::Peek -wle'my $a= "a" x 129; utf8::upgrade($a); print(
my $msg= pack("U0v/U0a*", $a)); Dump($msg);' | hexdump -C
SV = PV(0xd45f1d0) at 0xd475410
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xd485960
"\0\0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"\0
[UTF8 "\x{0}\x{0}aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]
  CUR = 131
  LEN = 136
00000000  00 00 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |..aaaaaaaaaaaaaa|
00000010  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
*
00000080  61 61 61 0a                                       |aaa.|
00000084


I don't understand why the string is still utf8-on.

I also dont understand why this new behavior wasn't added by a
regression proof "opt-in" mechanism, instead of with the current
"opt-out" behavior (assuming said behavior wasn't buggy, which it is).

> (apparently today I am supposed to be observing the public holiday. Whether
> I want to or not)

Well thanks for replying! Enjoy your holiday!

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About