Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
Thread Previous
|
Thread Next
From:
David Mertens
Date:
January 12, 2012 04:09
Subject:
Re: pack and ASCII
Message ID:
CA+4ieYXjBPBVkTrsMfsOfmSs=yA0S0g6cEDMXuNgTWeH=rxUOg@mail.gmail.com
On Wed, Jan 11, 2012 at 7:45 PM, Eric Brine <ikegami@adaelis.com> wrote:
> On Wed, Jan 11, 2012 at 7:18 PM, David Mertens <dcmertens.perl@gmail.com>wrote:
>
>> Eric -
>>
> (Let's stay away from the term "character" as it has potentially different
>> sizes in UTF8 and ASCII.)
>>
>
> I'd be happy to use "symbol" instead of "character" if it helped avoid
> confusion, except you went and defined "symbol" to mean "grapheme", which
> is some thing else entirely different.
>
> A string is a sequence of characters. When I use "character", I refer to
> this definition exclusively. In other words, a character is that which is
> returned by substr($s, $i, 1).
>
> My usage of pack and unpack focuses on manipulating binary data with C and
>> XS code. I would conjecture that your perspective stems from being able
>> output (with pack and print) and input (with <> and unpack) text columns
>> with fixed number of symbols.
>>
>
> I have used it for both. Currently, C<< pack "A20" >> handles both
> perfectly because C<< pack "A20" >> does not currently assign any meaning
> to the characters. They can be ASCII (as documented), encoded text, other
> binary data*, "symbols" (Unicode code points), 32-bit temperature
> readings*, etc.
>
> * -- C<< pack 'a20' >> would make more sense for these, though.
>
> On Wed, Jan 11, 2012 at 2:13 AM, Eric Brine <ikegami@adaelis.com> wrote:
>>
>>> Calling SvPVbyte a second time doesn't help, and it breaks A, a, Z, W,
>>> and U pack patterns.
>>>
>>> Well, I suppose it would help when interacting with buggy modules
>>> (modules mistakenly using SvPV without checking SvUTF8 instead of using
>>> SvPVbyte), but we shouldn't break useful functions to cater to broken
>>> modules.
>>>
>>
>> Aside from being offensive
>>
>
> huh? What part? My suggestion that it helps code that suffers from The
> Unicode Bug, or my opinion that we shouldn't break C<pack>?
>
I never check SvUTF8 in my XS code. You think that suffers from the Unicode
bug. Never mind that I pack floating point arrays for numerical
manipulation.
> I believe this is poorly thought-out. What if you mix data types?
>>
>
> Actually, not only did I consider that, I mentioned two days ago (Mon, 9
> Jan 2012 20:17:37 -0500) that this is the reason why having pack always
> downgrade the result isn't possible. "W" and "U" are unambiguously designed
> to return unicode code points.
>
> You are suggesting that this scalar should be explicitly marked as having
>> UTF8:
>>
> $scalar1 = pack("A4", $string);
>>
>
> No, not at all. I don't care at all how Perl chooses to store a string in
> an SV. ***If one has code that depends on how Perl stores a string in an
> SV, it suffers from The Unicode Bug.***
>
> I'm perfectly fine with it being UTF8=0 when possible. (I specifically
> said this twice before.)
>
> but what about this one:
>>
> $scalar2 = pack("A4 f", $string, exp(1));
>>
>
> I still don't care at all how Perl chooses to store a string in an SV.
>
> I would say that the latter should be a byte string
>>
>
> Why do you care? Could you give an practical example where it matters?
>
Perl has the T_OPAQUPTR type map in which Perl allocates and manages the
memory for me, but its internal structure is up to the XS programmer. A
typical use case would be to allocate enough memory to hold a specified
struct. I would use pack and unpack to manage that data Perl-side, and use
a C struct to get at the data C-side. Therefore, I want guarantees about
how Perl represents the data internally when I use pack, which I thought
was the whole point of pack. You seem to argue that I can make no
assumptions about the internal representation of a Perl scalar unless I
handle all the manipulations in my own XS code. That is the Interop bug, so
to speak, which has been around far longer than the Unicode bug.
as should all scalars that result from a pack operation, so that pack DWIM.
>
>
> If A, a, Z, W and U stop working, they don't dwim.
>
>
>> Are you arguing that $scalar1 is a special case that should be marked as
>> UTF8 so it DWYM?
>>
>
> No. Quite the opposite. I've been arguing there shouldn't be a special
> case. It should continue to work with arbitrary strings. There's far more
> and better reasons against treating some of them special (including that it
> would break existing code).
>
I have made a few comments about my use of pack, but I use pack in very
specific ways and I generally don't pack text. I need to look more closely
at the documentation in order to contribute more meaningfully to this
conversation. I believe that the relevant documentation includes pack,
unpack, and perlpacktut. Anything else I should reread?
David
Thread Previous
|
Thread Next