develooper Front page | perl.perl5.porters | Postings from May 2013

Re: How on earth did we manage to break pack() so badly?

Thread Previous | Thread Next
From:
David Golden
Date:
May 1, 2013 15:06
Subject:
Re: How on earth did we manage to break pack() so badly?
Message ID:
CAOeq1c_-0Q5Bej4VYbeBo3TCGr78dt-SdJkEKjUGFvS6y-CWWQ@mail.gmail.com
On Wed, May 1, 2013 at 10:32 AM, demerphq <demerphq@gmail.com> wrote:
> perl -le'unpack "H*", "\x{DF}\x{100}"'
>
> Produces completely different results depending on which Perl you are
> on. On older perls it produces a relatively useful:
>
> c39fc480
>
> which as we all know if the hex output of the raw UTF8 form of the
> string. On newer perls it produces the completely useless:
>
> df00
>
> Which is not correct regardless of how you look at it. The older
> behavior was at least correct in some regard.

I see some merit in not having pack treat a string as octets if its
internally stored in UTF-8.  We have been trying to draw a line and
say "internal representation is not something users need to know
about".

OTOH, as you point out, "df00" is not useful, either.

My initial instinct is that packing/unpacking a string with characters
> should have a "wide character in pack/unpack" warning, like we do
for print, unless the template has an explicit rule for handling wide
characters (like "U").  Then I'm fine with "df00" being the result.

I don't like the "U0" answer.  That's another one of those arcane "you
have to know what's going on internally to understand why this works"
tricks.

I think unpack needs a new modifier that means that a character string
should be unpacked as UTF-8 octets instead of as characters, so that
one could unpack it as hex or anything else.

--
David Golden <xdg@xdg.me>
Take back your inbox! → http://www.bunchmail.com/
Twitter/IRC: @xdg

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About