develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
jpl
Date:
January 10, 2012 03:57
Subject:
Re: pack and ASCII
Message ID:
4F0C27A5.9060901@research.att.com
On 01/09/12 20:17, Eric Brine wrote:
> On Mon, Jan 9, 2012 at 3:30 PM, Leon Timmermans <fawaka@gmail.com 
> <mailto:fawaka@gmail.com>> wrote:
>
>     On Mon, Jan 9, 2012 at 8:55 PM, Eric Brine <ikegami@adaelis.com
>     <mailto:ikegami@adaelis.com>> wrote:
>     > C<< pack 'A*' >> correctly packs all strings of bytes (whether
>     UTF8=0 or
>     > UTF8=1). I don't see why it's a bug that it usefully works for
>     characters
>     > that aren't bytes too. Are you using C<< pack 'A*' >> to
>     validate your data?
>     > You can use one of the following to do that:
>
>     Is also means that «pack "A1", $foo» can't be relied upon to be only
>     one byte
>
>
> You can count on «pack "A1", $foo» never return more than one byte.
>
> You can't count on the character returned by «pack "A1", $foo» to be a 
> byte, though.
>
>     which means a whole extra layer of validation is necessary.
>
>
> I presume you're expecting «pack "A*", "ţ"» to die/warn with "Wide 
> character"?
>
> Then I agree, if you have a buggy code, that change would remove the 
> need for an extra layer of validation to detect that bug. (The bug, of 
> course, is most likely that you forgot to encode your text.)
>
> It comes down the following question: Is it more useful for «pack 
> "A*", $char» to work or for «pack "A*", $non_bytes» to throw/report an 
> error?
>
> My vote is for the former.
>
Mine, too.
> * The latter has very circumstantial uses.
> * The latter isn't backwards compatible.
> * The latter removes a useful feature.
> * The latter is redundant with existing errors (meaning you'll get the 
> wide character warning and/or easily noticeable garbage latter anyways).
>
>     Quite frankly, I think we absolutely need a for pack formats that have
>     strong guarantees on number of octets
>
>
> C<< pack "A1" >> will never return more than one octet.
>
Or less than one octet.  I'd expect C<< pack "A$n" >> to return exactly 
$n octets, padding with blanks or truncating as necessary, with the 
understanding that truncation may result in something that cannot be 
turned back into something reasonable with C<< unpack "A$n" >>.  Which 
brings us back to the original question, do

      A  A text (ASCII) string, will be space padded.
      Z  A null-terminated (ASCIZ) string, will be null padded.


need to be modified in the pack documentation?  -- jpl
> - Eric
>


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About