Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
Thread Previous
|
Thread Next
From:
Torsten Förtsch
Date:
January 12, 2012 09:41
Subject:
Re: pack and ASCII
Message ID:
1921025.GoT2555j1F@opi.home
On Thursday, 12 January 2012 13:28:57 demerphq wrote:
> I consider the utf8 flag being ON here to be both a regression, and a
> bug. IMO the output of pack "A" should NEVER have the utf8 flag set.
I am using Perl for 14 years now and I was really really surprised to read
here that the output of C<pack> might have the UTF8 flag set. I have always
seen it as a tool to construct structures to pass to C level or to parse them.
Then I asked a colleague what he thinks. And he was surprised as much.
Especially I am surprised that not only "A" can return utf8 strings but also
"a" (at least with 5.12.3):
perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack +("a","A")
[$_&1]."12ii", +(sub{$_[0]},\&encode_utf8)[$_>>1]->("Förtsch"), 10, 20
for(0..3);'
SV = PV(0x604208) at 0x64ee50
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK,UTF8)
PV = 0x6213e0 "F\303\266rtsch\0\0\0\0\0\n\0\0\0\24\0\0\0"\0 [UTF8
"F\x{f6}rtsch\x{0}\x{0}\x{0}\x{0}\x{0}\n\x{0}\x{0}\x{0}\x{14}\x{0}\x{0}\x{0}"]
CUR = 21
LEN = 56
SV = PV(0x604208) at 0x64ee50
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK,UTF8)
PV = 0x6213e0 "F\303\266rtsch \n\0\0\0\24\0\0\0"\0 [UTF8 "F\x{f6}rtsch
\n\x{0}\x{0}\x{0}\x{14}\x{0}\x{0}\x{0}"]
CUR = 21
LEN = 56
SV = PV(0x604208) at 0x64ee50
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x6213e0 "F\303\266rtsch\0\0\0\0\n\0\0\0\24\0\0\0"\0
CUR = 20
LEN = 56
SV = PV(0x604208) at 0x64ee50
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x6213e0 "F\303\266rtsch \n\0\0\0\24\0\0\0"\0
CUR = 20
LEN = 56
To me as a mere user this feels like a bug.
Further, even length specifications and alignment get confused by utf8
strings. With the pack format C<N/a*x!4i> I'd expect the last number to be
aligned *always* at a 4 byte boundary. But surprise:
$ perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack "N/a*x!4i",
"Förtsch", 10'
SV = PV(0x604088) at 0x626940
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK,UTF8)
PV = 0x618460 "\0\0\0\7F\303\266rtsch\0\n\0\0\0"\0 [UTF8
"\x{0}\x{0}\x{0}\aF\x{f6}rtsch\x{0}\n\x{0}\x{0}\x{0}"]
CUR = 17
LEN = 24
Also the first number, the length of the following string, should be 8 and not
7.
Of course, one can argue you get out what you put in. If you want to get out
an octet string please bother to make sure you put in only such. But the code
above has worked as expected at least up to 5.8.8 which was a version that was
widely spread for years.
This is 5.8.8:
$ perl -Mstrict -Mutf8 -MEncode -MDevel::Peek -e 'Dump pack "N/A*x!4i",
"Förtsch", 10'
SV = PV(0x814fb00) at 0x814ed30
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x8164c40 "\0\0\0\10F\303\266rtsch\n\0\0\0"\0
CUR = 16
LEN = 20
I can see useful applications of the current way pack works. But please make
it either warn or better die when it is passed a character string. Make that
warning go away by a "use feature" or something similar. Or use a new pack
format to pack character (as opposed to octet) strings.
Torsten Förtsch
--
Need professional modperl support? Hire me! (http://foertsch.name)
Like fantasy? http://kabatinte.net
Thread Previous
|
Thread Next