Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
Thread Previous
|
Thread Next
From:
Aristotle Pagaltzis
Date:
January 14, 2012 08:01
Subject:
Re: pack and ASCII
Message ID:
20120114160131.GG817@fernweh.plasmasturm.org
* David Mertens <dcmertens.perl@gmail.com> [2012-01-12 13:10]:
> >>> Well, I suppose it would help when interacting with buggy modules
> >>> (modules mistakenly using SvPV without checking SvUTF8 instead of
> >>> using SvPVbyte), but we shouldn't break useful functions to cater
> >>> to broken modules.
> >>
> >> Aside from being offensive
> >
> > huh? What part? My suggestion that it helps code that suffers from
> > The Unicode Bug, or my opinion that we shouldn't break C<pack>?
>
> I never check SvUTF8 in my XS code. You think that suffers from the
> Unicode bug. Never mind that I pack floating point arrays for
> numerical manipulation.
The UTF8 flag does not tell you whether a string is characters or not.
It tells you whether its PV buffer is a packed octet array (UTF8=0) or
a variable-width integer sequence (UTF8=1).
If you are disregarding this information, your code is broken.
I am sorry if the level on which you evaluate correctness arguments
about your code is as judgements on your personality, since the two
have nothing to do with each other.
> Perl has the T_OPAQUPTR type map in which Perl allocates and manages
> the memory for me, but its internal structure is up to the XS
> programmer. A typical use case would be to allocate enough memory to
> hold a specified struct. I would use pack and unpack to manage that
> data Perl-side, and use a C struct to get at the data C-side.
> Therefore, I want guarantees about how Perl represents the data
> internally when I use pack, which I thought was the whole point of
> pack. You seem to argue that I can make no assumptions about the
> internal representation of a Perl scalar unless I handle all the
> manipulations in my own XS code. That is the Interop bug, so to speak,
> which has been around far longer than the Unicode bug.
The matter is very simple: Perl strings come in two representations.
Both representations mean the same thing, and as you are using them to
represent bytes, not characters, they can always be translated into each
other.
• Encode inputs to `pack` if your own interfaces accepts characters but
needs to yield or pass bytes. Document whether your own interfaces
accept bytes or characters or take a flag that specifies which of the
two, for each input string, and never try to guess what a string is by
looking at it.
• Downgrade inputs in your XS code when you need a byte array.
Then the representation is guaranteed, and it is also correct within the
Perl string model.
All other approaches are incorrect.
Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
Thread Previous
|
Thread Next