develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
Eric Brine
January 9, 2012 17:17
Re: pack and ASCII
Message ID:
On Mon, Jan 9, 2012 at 3:30 PM, Leon Timmermans <> wrote:

> On Mon, Jan 9, 2012 at 8:55 PM, Eric Brine <> wrote:
> > C<< pack 'A*' >> correctly packs all strings of bytes (whether UTF8=0 or
> > UTF8=1). I don't see why it's a bug that it usefully works for characters
> > that aren't bytes too. Are you using C<< pack 'A*' >> to validate your
> data?
> > You can use one of the following to do that:
> Is also means that «pack "A1", $foo» can't be relied upon to be only
> one byte

You can count on «pack "A1", $foo» never return more than one byte.

You can't count on the character returned by «pack "A1", $foo» to be a
byte, though.

> which means a whole extra layer of validation is necessary.

I presume you're expecting «pack "A*", "ţ"» to die/warn with "Wide

Then I agree, if you have a buggy code, that change would remove the need
for an extra layer of validation to detect that bug. (The bug, of course,
is most likely that you forgot to encode your text.)

It comes down the following question: Is it more useful for «pack "A*",
$char» to work or for «pack "A*", $non_bytes» to throw/report an error?

My vote is for the former.

* The latter has very circumstantial uses.
* The latter isn't backwards compatible.
* The latter removes a useful feature.
* The latter is redundant with existing errors (meaning you'll get the wide
character warning and/or easily noticeable garbage latter anyways).

> Quite frankly, I think we absolutely need a for pack formats that have
> strong guarantees on number of octets

C<< pack "A1" >> will never return more than one octet.

- Eric

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About