Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
Thread Previous
|
Thread Next
From:
jpl
Date:
January 10, 2012 03:57
Subject:
Re: pack and ASCII
Message ID:
4F0C27A5.9060901@research.att.com
On 01/09/12 20:17, Eric Brine wrote:
> On Mon, Jan 9, 2012 at 3:30 PM, Leon Timmermans <fawaka@gmail.com
> <mailto:fawaka@gmail.com>> wrote:
>
> On Mon, Jan 9, 2012 at 8:55 PM, Eric Brine <ikegami@adaelis.com
> <mailto:ikegami@adaelis.com>> wrote:
> > C<< pack 'A*' >> correctly packs all strings of bytes (whether
> UTF8=0 or
> > UTF8=1). I don't see why it's a bug that it usefully works for
> characters
> > that aren't bytes too. Are you using C<< pack 'A*' >> to
> validate your data?
> > You can use one of the following to do that:
>
> Is also means that «pack "A1", $foo» can't be relied upon to be only
> one byte
>
>
> You can count on «pack "A1", $foo» never return more than one byte.
>
> You can't count on the character returned by «pack "A1", $foo» to be a
> byte, though.
>
> which means a whole extra layer of validation is necessary.
>
>
> I presume you're expecting «pack "A*", "ţ"» to die/warn with "Wide
> character"?
>
> Then I agree, if you have a buggy code, that change would remove the
> need for an extra layer of validation to detect that bug. (The bug, of
> course, is most likely that you forgot to encode your text.)
>
> It comes down the following question: Is it more useful for «pack
> "A*", $char» to work or for «pack "A*", $non_bytes» to throw/report an
> error?
>
> My vote is for the former.
>
Mine, too.
> * The latter has very circumstantial uses.
> * The latter isn't backwards compatible.
> * The latter removes a useful feature.
> * The latter is redundant with existing errors (meaning you'll get the
> wide character warning and/or easily noticeable garbage latter anyways).
>
> Quite frankly, I think we absolutely need a for pack formats that have
> strong guarantees on number of octets
>
>
> C<< pack "A1" >> will never return more than one octet.
>
Or less than one octet. I'd expect C<< pack "A$n" >> to return exactly
$n octets, padding with blanks or truncating as necessary, with the
understanding that truncation may result in something that cannot be
turned back into something reasonable with C<< unpack "A$n" >>. Which
brings us back to the original question, do
A A text (ASCII) string, will be space padded.
Z A null-terminated (ASCIZ) string, will be null padded.
need to be modified in the pack documentation? -- jpl
> - Eric
>
Thread Previous
|
Thread Next