Front page | perl.perl5.porters |
Postings from May 2013
Re: How on earth did we manage to break pack() so badly?
Thread Previous
|
Thread Next
From:
Dave Mitchell
Date:
May 1, 2013 15:23
Subject:
Re: How on earth did we manage to break pack() so badly?
Message ID:
20130501152250.GE2216@iabyn.com
On Wed, May 01, 2013 at 04:32:07PM +0200, demerphq wrote:
> It used to be nice and safe to do this:
>
> print unpack("H*", $_),"\n"; # lets see what the string looks like in the raw.
>
>
> This is no longer an effective debugging technique. It will NOT tell
> you what your string looks like. It takes a "daddy knows best"
> attitude and tries to do the right thing depending on whether the data
> is utf8 or the data is not. Which means that this:
>
> perl -le'unpack "H*", "\x{DF}\x{100}"'
>
> Produces completely different results depending on which Perl you are
> on. On older perls it produces a relatively useful:
>
> c39fc480
But that's just leaking the internal implementation details.
>
> which as we all know if the hex output of the raw UTF8 form of the
> string. On newer perls it produces the completely useless:
>
> df00
It's not particularly useful, but it is consistent. It's reading two
characters, and displaying their values modulo 256 (since H is supposed
to issue exactly two hex digits per character).
If you want the old behaviour, but in a safe way:
utf8::encode(my $s = "\x{DF}\x{100}");
print unpack "H*", $s;
Really, the unpack interface was never designed to handle chars > 255.
> I remember some of the discussion relating to pack doing the wrong
> thing when strings are accidentally upgraded, but I had the impression
> that we were only going to change a few minor aspects, but it seems we
> have changed so much that now pack is a) heavily broken in terms of
> regression failures, b) relatively useless for various purposes where
> it is heavily used.
>
> Consider another example:
>
> pack "v/a", $string;
>
> This should produce a string with a short int length, followed by the
> appropriate number of bytes. However in modern perls, if the string is
> utf8 enabled it does not:
>
> $ perl -MDevel::Peek -wle'my $a= "a" x 129; utf8::upgrade($a); print(
> my $msg= pack("v/a", $a)); Dump($msg);' | hexdump -C
> SV = PV(0x778e150) at 0x77a4398
> REFCNT = 1
> FLAGS = (PADMY,POK,pPOK,UTF8)
> PV = 0x77b4840
> "\302\201\0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"\0
> [UTF8 "\x{81}\x{0}aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]
> CUR = 132
> LEN = 136
> 00000000 81 00 61 61 61 61 61 61 61 61 61 61 61 61 61 61 |..aaaaaaaaaaaaaa|
> 00000010 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 |aaaaaaaaaaaaaaaa|
> *
> 00000080 61 61 61 0a |aaa.|
> 00000084
>
> There are two important things to note here, first, the "v" part of
> the string has been silently upgraded, completely breaking it as a
> shortint. Any external code designed to inter-operate with a program
> using this structure will be broken.
That looks like a bug.
> The second point is that debugging this stuff is hard, as Perl "hides"
> some of the problem by being "clever" about filehandle discipline:
> when we print the code point 81 which is internally represented in
> utf8 as "\302\201" perls output layers downgrades it, without warning,
> back to the correct 81.
If you want perl to output utf8, tell it that STDOUT supports this, e.g.
with perl -CO.
> Anyway, the bottom line is that there appears to be NO way to get pack
> to operate on the binary representation of a string.
Yes there is, just make sure you're feeding it a bunch of characters with
ords < 256, by using utf8::encode/decode where appropriate.
> I cannot express how unhappy I am to find out about these changes. The
> lack of analytic depth behind these changes is staggering (the
> implication on things like v/a should have been immediately obvious).
> I cannot believe that we let the "there is no such thing as binary
> data" mob paint us into such a ridiculous position.
I think the issue can be summed up as:
* un/pack were designed in a world where ord($chr) was always < 256,
and there was always a 1:1 mapping between chars and their byte storage;
* utf8 and unicode broke this assumption;
* the semantics of a lot of template actions are/were poorly defined for
chars > 255, and a lot of their behaviours were broken, or broke
encapsulation;
* some of these behaviours have now been fixed, and others still need
fixing.
* Some of those fixes have clashes withg your mental model of how pack
should work.
> So lets assume I want the old behavior of pack. How can I get it? My
> current understanding is that there is no way to get it at all
See my two-line example above.
> Which seems to be a pretty poor solution to me. Considering the "there
> is no such thing as binary data" mob is always banging on about
> "representation shouldn't matter, strings are strings" it seems pretty
> crappy to require us to inspect the utf8 flag on pretty much any pack
> operation that operates on strings.
As I have shown, you don't need to inspect the flag. In perl now, a
string is just a list of ordinal numbers, where sometimes those numbers
are > 256. If you try to do packing and unpacking on such non-byte numbers,
you're going to be in a world of pain. Either avoid such strings, or use
utf8::decode/encode or pack "U" as appropriate.
> Seems like in attempting to fix
> one set of perceived problems we just shifted the problem elsewhere,
> and IMO made it worse.
I think I disagree with you, but I could potentially be convinced with
further examples.
> Anyway, I want pack to be able to pack an arbitrary string without
> a) ending up with a utf8 on packed string, b) without it corrupting
> binary data structures like "v/a*", c) where the output is not
> correct. How do I get it? Do I start adding new patterns to pack?
I don't see any such need. Modulo bug fixing (such as v/a), I think perl
does everything you need.
> Do I
> start reverting the patches responsible for this insane behavior for
> 5.20?
No ;-)
--
A major Starfleet emergency breaks out near the Enterprise, but
fortunately some other ships in the area are able to deal with it to
everyone's satisfaction.
-- Things That Never Happen in "Star Trek" #13
Thread Previous
|
Thread Next