develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
January 10, 2012 04:44
Subject:
Re: pack and ASCII
Message ID:
CAHhgV8jsLR03ztmhggq7EE-hZSuOeTeH+BWRvMTmZM-+jn_XAg@mail.gmail.com
On Tue, Jan 10, 2012 at 2:17 AM, Eric Brine <ikegami@adaelis.com> wrote:
> You can count on «pack "A1", $foo» never return more than one byte.
>
> You can't count on the character returned by «pack "A1", $foo» to be a byte,
> though.

Only one of those two can be true, and it isn't the former:

perl -E 'no bytes; use utf8; my $foo = pack "A1", "ţ"; say bytes::length($foo)'
2

> I presume you're expecting «pack "A*", "ţ"» to die/warn with "Wide
> character"?
>
> Then I agree, if you have a buggy code, that change would remove the need
> for an extra layer of validation to detect that bug. (The bug, of course, is
> most likely that you forgot to encode your text.)
>
> It comes down the following question: Is it more useful for «pack "A*",
> $char» to work or for «pack "A*", $non_bytes» to throw/report an error?

No, the question is if «pack "A", $byte» should DWIM or «pack "A",
$character». The only sane way out of this mess would be to split this
up in two different formats, the question is though which one gets the
letter 'A'.  I think it should the former.

My opinion on that should be obvious by now.

> C<< pack "A1" >> will never return more than one octet.

If only that were the case.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About