On 1 May 2013 17:05, David Golden <xdg@xdg.me> wrote: > On Wed, May 1, 2013 at 10:32 AM, demerphq <demerphq@gmail.com> wrote: >> perl -le'unpack "H*", "\x{DF}\x{100}"' >> >> Produces completely different results depending on which Perl you are >> on. On older perls it produces a relatively useful: >> >> c39fc480 >> >> which as we all know if the hex output of the raw UTF8 form of the >> string. On newer perls it produces the completely useless: >> >> df00 >> >> Which is not correct regardless of how you look at it. The older >> behavior was at least correct in some regard. > > I see some merit in not having pack treat a string as octets if its > internally stored in UTF-8. We have been trying to draw a line and > say "internal representation is not something users need to know > about". Yet that has never really been true, and the people peddling the line are responsible for much of the mess we are in now. There has *always* been data that is *not* character oriented which has been stored in strings, and where you really do have to know about the internal representation. IMO pretending otherwise has created far more problems than it solved. > OTOH, as you point out, "df00" is not useful, either. > > My initial instinct is that packing/unpacking a string with characters >> should have a "wide character in pack/unpack" warning, like we do > for print, unless the template has an explicit rule for handling wide > characters (like "U"). Then I'm fine with "df00" being the result. > > I don't like the "U0" answer. That's another one of those arcane "you > have to know what's going on internally to understand why this works" > tricks. > > I think unpack needs a new modifier that means that a character string > should be unpacked as UTF-8 octets instead of as characters, so that > one could unpack it as hex or anything else. Isnt this basically just the same thing as "U0"? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next