Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
Thread Previous
|
Thread Next
From:
David Mertens
Date:
January 11, 2012 16:18
Subject:
Re: pack and ASCII
Message ID:
CA+4ieYVhUvD8dfowHeYnZ0tsLU6jO4T9DV+dJCMf18B2V8N9Wg@mail.gmail.com
Eric -
My usage of pack and unpack focuses on manipulating binary data with C and
XS code. I would conjecture that your perspective stems from being able
output (with pack and print) and input (with <> and unpack) text columns
with fixed number of symbols. (Let's stay away from the term "character" as
it has potentially different sizes in UTF8 and ASCII.)
On Wed, Jan 11, 2012 at 2:13 AM, Eric Brine <ikegami@adaelis.com> wrote:
> On Wed, Jan 11, 2012 at 2:13 AM, Jesse Luehrs <doy@tozt.net> wrote:
>
>> On Wed, Jan 11, 2012 at 01:36:23AM -0500, Eric Brine wrote:
>> > On Tue, Jan 10, 2012 at 5:54 PM, Leon Timmermans <fawaka@gmail.com>
>> wrote:
>> > > Not true. Those aren't exclusive. Right now, both DWIM. The former
>> returns
>> > >
>> > > exactly one byte. The latter returns exactly one character.
>> > >
>> >
>> > > They don't DWIM for me. I mean to get a bytestring as result.
>> >
>> >
>> > If you have code that requires a UTF8=0 string specifically, it is
>> buggy.
>> > Specifically, it suffers from the Unicode bug. You are probably using
>> SvPV
>> > without looking at the SvUTF8. The solution is simple: Use SvPVbyte
>> instead.
>>
>> There has to be some point when code can assume that it has a byte
>> string. What Leon is saying is that it's a lot more useful for pack to
>> use SvPVbyte itself automatically
>
>
> Calling SvPVbyte a second time doesn't help, and it breaks A, a, Z, W, and
> U pack patterns.
>
> Well, I suppose it would help when interacting with buggy modules (modules
> mistakenly using SvPV without checking SvUTF8 instead of using SvPVbyte),
> but we shouldn't break useful functions to cater to broken modules.
>
Aside from being offensive, I believe this is poorly thought-out. What if
you mix data types? You are suggesting that this scalar should be
explicitly marked as having UTF8:
$scalar1 = pack("A4", $string);
but what about this one:
$scalar2 = pack("A4 f", $string, exp(1));
I would say that the latter should be a byte string, as should all scalars
that result from a pack operation, so that pack DWIM. Are you arguing that
$scalar1 is a special case that should be marked as UTF8 so it DWYM?
We could try to cater to them without breaking C<pack> by having C<pack>
> could downgrade *if possible* (C<< sv_utf8_downgrade(sv, 1) >>).
>
>
>> > What should pack("A*", $_) return for a byte, and what should it
>> return for
>> > a non-byte?
>>
>> It should be an error (or at least a warning) for the 'A' format to
>> receive a non-byte.
>>
>
> I've *never* had to disable warnings to do something legit (e.g. C<<
> pack("A20A20", $text1, $text2); >>). At most, I've had to add C<< // "" >>
> to avoid uninitialized warnings or I've had to add parens to avoid
> "ambiguous" warnings.
>
> - Eric
>
>
David
Thread Previous
|
Thread Next