Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
From: Eric Brine
January 13, 2012 23:47
Re: pack and ASCII
Message ID: CALJW-qFXqwEsUjOYo+w6ML=VAtJQ7V_9JvJs+LoBR-uSH+aYzA@mail.gmail.com
On Fri, Jan 13, 2012 at 5:46 PM, Leon Timmermans <email@example.com> wrote:
> On Fri, Jan 13, 2012 at 11:17 PM, Eric Brine <firstname.lastname@example.org> wrote:
> > No, encoding the output of C<pack> would produce garbage, e.g. C<<
> > 0x80112233) >> would return 5 bytes instead of 4.
> I have no idea what you're talking about here. Can you please stop
interpreting anything anyone else says in the most unreasonable way
I said C<pack> should always downgrade its output. I was most definitely
including C<< pack "N/A*" >> and C<< pack "N/A*" >> in my statement.
So yes, you did ask me if I meant that the output of C<< pack "N" >> should
And no, that would be stupid.
> Maybe you meanting encoding the input of C<< pack "A" >>, but that would
> break C<< pack("A20", $byte) >> and change C<< pack("A20", $text) >> in an
> > incompatible way.
> Well, obviously I didn't mean to encode bytes (why would anyone try
Because Perl can't tell the difference between bytes and text. It has no
way to know whether E9 is a byte, "é" or something else.
Either C<< pack "A" >> encodes or it doesn't. If it does, it causes C<<
pack "A", $byte >> to return garbage and C<< pack "A", $text >> to be
and I don't see how encode breaks $text more than downgrading
would. And it's not like anyone still wants latin-1.
Downgrading doesn't change the string at all -- it doesn't encode it, using
latin-1 or otherwise -- so it is backwards compatible. It's only The
advantage is that it helps modules that suffer from The Unicode Bug.
Encoding (by which I presume you mean encoding using UTF-8) changes the
string, so it's not backwards compatible.
> hum? C<< pack "a" >> has always been very useful. It's used to build C
> > structs and fixed-width records, for starters.
> Obviously, I mean the behavior of «pack "a", $char» ; I think I've been
> quite explicit about liking its byte semantics. Is there any use case
> for character semantics in that case? I can't think of it, at all.
What do you mean by byte and character semantics? Different behaviour based
on UTF8 flag? There no difference. If there was, that would be a bug.