develooper Front page | perl.perl5.porters | Postings from May 2013

Re: How on earth did we manage to break pack() so badly?

Thread Previous | Thread Next
David Golden
May 1, 2013 15:06
Re: How on earth did we manage to break pack() so badly?
Message ID:
On Wed, May 1, 2013 at 10:32 AM, demerphq <> wrote:
> perl -le'unpack "H*", "\x{DF}\x{100}"'
> Produces completely different results depending on which Perl you are
> on. On older perls it produces a relatively useful:
> c39fc480
> which as we all know if the hex output of the raw UTF8 form of the
> string. On newer perls it produces the completely useless:
> df00
> Which is not correct regardless of how you look at it. The older
> behavior was at least correct in some regard.

I see some merit in not having pack treat a string as octets if its
internally stored in UTF-8.  We have been trying to draw a line and
say "internal representation is not something users need to know

OTOH, as you point out, "df00" is not useful, either.

My initial instinct is that packing/unpacking a string with characters
> should have a "wide character in pack/unpack" warning, like we do
for print, unless the template has an explicit rule for handling wide
characters (like "U").  Then I'm fine with "df00" being the result.

I don't like the "U0" answer.  That's another one of those arcane "you
have to know what's going on internally to understand why this works"

I think unpack needs a new modifier that means that a character string
should be unpacked as UTF-8 octets instead of as characters, so that
one could unpack it as hex or anything else.

David Golden <>
Take back your inbox! →
Twitter/IRC: @xdg

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About