On Wed, May 1, 2013 at 10:32 AM, demerphq <demerphq@gmail.com> wrote: > perl -le'unpack "H*", "\x{DF}\x{100}"' > > Produces completely different results depending on which Perl you are > on. On older perls it produces a relatively useful: > > c39fc480 > > which as we all know if the hex output of the raw UTF8 form of the > string. On newer perls it produces the completely useless: > > df00 > > Which is not correct regardless of how you look at it. The older > behavior was at least correct in some regard. I see some merit in not having pack treat a string as octets if its internally stored in UTF-8. We have been trying to draw a line and say "internal representation is not something users need to know about". OTOH, as you point out, "df00" is not useful, either. My initial instinct is that packing/unpacking a string with characters > should have a "wide character in pack/unpack" warning, like we do for print, unless the template has an explicit rule for handling wide characters (like "U"). Then I'm fine with "df00" being the result. I don't like the "U0" answer. That's another one of those arcane "you have to know what's going on internally to understand why this works" tricks. I think unpack needs a new modifier that means that a character string should be unpacked as UTF-8 octets instead of as characters, so that one could unpack it as hex or anything else. -- David Golden <xdg@xdg.me> Take back your inbox! → http://www.bunchmail.com/ Twitter/IRC: @xdgThread Previous | Thread Next