develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Torsten Förtsch
Date:
January 12, 2012 09:41
Subject:
Re: pack and ASCII
Message ID:
1921025.GoT2555j1F@opi.home
On Thursday, 12 January 2012 13:28:57 demerphq wrote:
> I consider the utf8 flag being ON here to be both a regression, and a
> bug. IMO the output of pack "A" should NEVER have the utf8 flag set.

I am using Perl for 14 years now and I was really really surprised to read 
here that the output of C<pack> might have the UTF8 flag set. I have always 
seen it as a tool to construct structures to pass to C level or to parse them.
Then I asked a colleague what he thinks. And he was surprised as much.

Especially I am surprised that not only "A" can return utf8 strings but also 
"a" (at least with 5.12.3):

perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack +("a","A")
[$_&1]."12ii", +(sub{$_[0]},\&encode_utf8)[$_>>1]->("Förtsch"), 10, 20 
for(0..3);'
SV = PV(0x604208) at 0x64ee50
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK,UTF8)
  PV = 0x6213e0 "F\303\266rtsch\0\0\0\0\0\n\0\0\0\24\0\0\0"\0 [UTF8 
"F\x{f6}rtsch\x{0}\x{0}\x{0}\x{0}\x{0}\n\x{0}\x{0}\x{0}\x{14}\x{0}\x{0}\x{0}"]
  CUR = 21
  LEN = 56
SV = PV(0x604208) at 0x64ee50
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK,UTF8)
  PV = 0x6213e0 "F\303\266rtsch     \n\0\0\0\24\0\0\0"\0 [UTF8 "F\x{f6}rtsch     
\n\x{0}\x{0}\x{0}\x{14}\x{0}\x{0}\x{0}"]
  CUR = 21
  LEN = 56
SV = PV(0x604208) at 0x64ee50
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x6213e0 "F\303\266rtsch\0\0\0\0\n\0\0\0\24\0\0\0"\0
  CUR = 20
  LEN = 56
SV = PV(0x604208) at 0x64ee50
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x6213e0 "F\303\266rtsch    \n\0\0\0\24\0\0\0"\0
  CUR = 20
  LEN = 56

To me as a mere user this feels like a bug.

Further, even length specifications and alignment get confused by utf8 
strings. With the pack format C<N/a*x!4i> I'd expect the last number to be 
aligned *always* at a 4 byte boundary. But surprise:

$ perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack "N/a*x!4i", 
"Förtsch", 10'
SV = PV(0x604088) at 0x626940
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK,UTF8)
  PV = 0x618460 "\0\0\0\7F\303\266rtsch\0\n\0\0\0"\0 [UTF8 
"\x{0}\x{0}\x{0}\aF\x{f6}rtsch\x{0}\n\x{0}\x{0}\x{0}"]
  CUR = 17
  LEN = 24

Also the first number, the length of the following string, should be 8 and not 
7.

Of course, one can argue you get out what you put in. If you want to get out 
an octet string please bother to make sure you put in only such. But the code 
above has worked as expected at least up to 5.8.8 which was a version that was 
widely spread for years.

This is 5.8.8:
$ perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack "N/A*x!4i", 
"Förtsch", 10'
SV = PV(0x814fb00) at 0x814ed30
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x8164c40 "\0\0\0\10F\303\266rtsch\n\0\0\0"\0
  CUR = 16
  LEN = 20

I can see useful applications of the current way pack works. But please make 
it either warn or better die when it is passed a character string. Make that 
warning go away by a "use feature" or something similar. Or use a new pack 
format to pack character (as opposed to octet) strings.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About