develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
Eric Brine
Date:
January 11, 2012 17:45
Subject:
Re: pack and ASCII
Message ID:
CALJW-qE+ZiaMPR1Uz3+6HM8GT32gpGKZpCm4tGPPrm3G1MJGNQ@mail.gmail.com
On Wed, Jan 11, 2012 at 7:18 PM, David Mertens <dcmertens.perl@gmail.com>wrote:

> Eric -
>
(Let's stay away from the term "character" as it has potentially different
> sizes in UTF8 and ASCII.)
>

I'd be happy to use "symbol" instead of "character" if it helped avoid
confusion, except you went and defined "symbol" to mean "grapheme", which
is some thing else entirely different.

A string is a sequence of characters. When I use "character", I refer to
this definition exclusively. In other words, a character is that which is
returned by substr($s, $i, 1).

My usage of pack and unpack focuses on manipulating binary data with C and
> XS code. I would conjecture that your perspective stems from being able
> output (with pack and print) and input (with <> and unpack) text columns
> with fixed number of symbols.
>

I have used it for both. Currently, C<< pack "A20" >> handles both
perfectly because C<< pack "A20" >> does not currently assign any meaning
to the characters. They can be ASCII (as documented), encoded text, other
binary data*, "symbols" (Unicode code points), 32-bit temperature
readings*, etc.

* -- C<< pack 'a20' >> would make more sense for these, though.

 On Wed, Jan 11, 2012 at 2:13 AM, Eric Brine <ikegami@adaelis.com> wrote:
>
>> Calling SvPVbyte a second time doesn't help, and it breaks A, a, Z, W,
>> and U pack patterns.
>>
>> Well, I suppose it would help when interacting with buggy modules
>> (modules mistakenly using SvPV without checking SvUTF8 instead of using
>> SvPVbyte), but we shouldn't break useful functions to cater to broken
>> modules.
>>
>
> Aside from being offensive
>

huh? What part? My suggestion that it helps code that suffers from The
Unicode Bug, or my opinion that we shouldn't break C<pack>?


>  I believe this is poorly thought-out. What if you mix data types?
>

Actually, not only did I consider that, I mentioned two days ago (Mon, 9
Jan 2012 20:17:37 -0500) that this is the reason why having pack always
downgrade the result isn't possible. "W" and "U" are unambiguously designed
to return unicode code points.

You are suggesting that this scalar should be explicitly marked as having
> UTF8:
>
$scalar1 = pack("A4", $string);
>

No, not at all. I don't care at all how Perl chooses to store a string in
an SV. ***If one has code that depends on how Perl stores a string in an
SV, it suffers from The Unicode Bug.***

I'm perfectly fine with it being UTF8=0 when possible. (I specifically said
this twice before.)

but what about this one:
>
$scalar2 = pack("A4 f", $string, exp(1));
>

I still don't care at all how Perl chooses to store a string in an SV.

I would say that the latter should be a byte string
>

Why do you care? Could you give an practical example where it matters?

as should all scalars that result from a pack operation, so that pack DWIM.


If A, a, Z, W and U stop working, they don't dwim.


> Are you arguing that $scalar1 is a special case that should be marked as
> UTF8 so it DWYM?
>

No. Quite the opposite. I've been arguing there shouldn't be a special
case. It should continue to work with arbitrary strings. There's far more
and better reasons against treating some of them special (including that it
would break existing code).

- Eric

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About