On 1 May 2013 16:32, demerphq <demerphq@gmail.com> wrote: > Consider another example: > > pack "v/a", $string; > > This should produce a string with a short int length, followed by the > appropriate number of bytes. However in modern perls, if the string is > utf8 enabled it does not: [...] Some more analysis of this bug: (with a perl 5.14.1) ~§ perl -wE 'say unpack "U0H*", pack "v/a","foo"' 0300666f6f ~§ perl -wE 'say unpack "U0H*", pack "v/a","foo\x{100}"' 0400666f6fc480 The leading 0400 in the 2nd example is obviously wrong, since the packed int is followed by five bytes, not four. (packing with "C0v/a" yields the same result, character mode being the default) Moreover, that packed string has the UTF8 flag on, which makes no sense to me: ~§ perl -MDevel::Peek -wE 'Dump pack "v/a","foo\x{100}"' SV = PV(0x7fa084004270) at 0x7fa084029de8 REFCNT = 1 FLAGS = (PADTMP,POK,pPOK,UTF8) PV = 0x7fa083c050a0 "\4\0foo\304\200"\0 [UTF8 "\x{4}\x{0}foo\x{100}"] CUR = 7 LEN = 16 Let's see what pack does when told to operate in byte mode: ~§ perl -wE 'say unpack "U0H*", pack "U0v/a","foo\x{100}"' Character(s) in 'a' format wrapped in pack at -e line 1. 0400666f6f00 Here, the packed int 4 is correctly followed by 4 bytes, and the last character has been truncated, as documented when byte mode is used -- under U0 pack expects byte input and discards what does not fit. However the packed string *still* has the UTF8 flag on. This is very wrong since it's possible to generate invalid UTF8 with it: ~§ perl -MDevel::Peek -wE 'Dump pack "U0v/a","foo\x{1f0}"' Character(s) in 'a' format wrapped in pack at -e line 1. SV = PV(0x7ff4a1804270) at 0x7ff4a1829de8 REFCNT = 1 FLAGS = (PADTMP,POK,pPOK,UTF8) PV = 0x7ff4a14050a0 "\4\0foo\360"\0Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xf0) in subroutine entry at -e line 1. [UTF8 "\x{4}\x{0}foo\x{0}"] CUR = 6 LEN = 16 What should be done in my opinion : - the output of pack should never have the utf8 flag on, it's just not the purpose of pack. - C0<length>/<format> should be fixed so the packed length correctly reflects the length of the following data.Thread Previous | Thread Next