develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
Eric Brine
January 13, 2012 23:47
Re: pack and ASCII
Message ID:
On Fri, Jan 13, 2012 at 5:46 PM, Leon Timmermans <> wrote:

> On Fri, Jan 13, 2012 at 11:17 PM, Eric Brine <> wrote:
> > No, encoding the output of C<pack> would produce garbage, e.g. C<<
> pack("N",
> > 0x80112233) >> would return 5 bytes instead of 4.
> I have no idea what you're talking about here. Can you please stop
interpreting anything anyone else says in the most unreasonable way
> possible?

I said C<pack> should always downgrade its output. I was most definitely
including C<< pack "N/A*" >> and C<< pack "N/A*" >> in my statement.

So yes, you did ask me if I meant that the output of C<< pack "N" >> should
be encoded.

And no, that would be stupid.

> Maybe you meanting encoding the input of C<< pack "A" >>, but that would
> break C<< pack("A20", $byte) >> and change C<< pack("A20", $text) >> in an
> > incompatible way.
> Well, obviously I didn't mean to encode bytes (why would anyone try
> that?)

Because Perl can't tell the difference between bytes and text. It has no
way to know whether E9 is a byte, "é" or something else.

Either C<< pack "A" >> encodes or it doesn't. If it does, it causes C<<
pack "A", $byte >> to return garbage and C<< pack "A", $text >> to be
backwards incompatible.

and I don't see how encode breaks $text more than downgrading
would. And it's not like anyone still wants latin-1.

Downgrading doesn't change the string at all -- it doesn't encode it, using
latin-1 or otherwise -- so it is backwards compatible. It's only The
advantage is that it helps modules that suffer from The Unicode Bug.

Encoding (by which I presume you mean encoding using UTF-8) changes the
string, so it's not backwards compatible.

 > hum? C<< pack "a" >> has always been very useful. It's used to build C
> > structs and fixed-width records, for starters.
> Obviously, I mean the behavior of «pack "a", $char» ; I think I've been
> quite explicit about liking its byte semantics. Is there any use case
> for character semantics in that case? I can't think of it, at all.

What do you mean by byte and character semantics? Different behaviour based
on UTF8 flag? There no difference. If there was, that would be a bug.

- Eric

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About