Front page | perl.perl5.porters |
Postings from January 2012
Re: pack and ASCII
From: Aristotle Pagaltzis
January 14, 2012 08:01
Re: pack and ASCII
Message ID: 20120114160131.GG817@fernweh.plasmasturm.org
* David Mertens <email@example.com> [2012-01-12 13:10]:
> >>> Well, I suppose it would help when interacting with buggy modules
> >>> (modules mistakenly using SvPV without checking SvUTF8 instead of
> >>> using SvPVbyte), but we shouldn't break useful functions to cater
> >>> to broken modules.
> >> Aside from being offensive
> > huh? What part? My suggestion that it helps code that suffers from
> > The Unicode Bug, or my opinion that we shouldn't break C<pack>?
> I never check SvUTF8 in my XS code. You think that suffers from the
> Unicode bug. Never mind that I pack floating point arrays for
> numerical manipulation.
The UTF8 flag does not tell you whether a string is characters or not.
It tells you whether its PV buffer is a packed octet array (UTF8=0) or
a variable-width integer sequence (UTF8=1).
If you are disregarding this information, your code is broken.
I am sorry if the level on which you evaluate correctness arguments
about your code is as judgements on your personality, since the two
have nothing to do with each other.
> Perl has the T_OPAQUPTR type map in which Perl allocates and manages
> the memory for me, but its internal structure is up to the XS
> programmer. A typical use case would be to allocate enough memory to
> hold a specified struct. I would use pack and unpack to manage that
> data Perl-side, and use a C struct to get at the data C-side.
> Therefore, I want guarantees about how Perl represents the data
> internally when I use pack, which I thought was the whole point of
> pack. You seem to argue that I can make no assumptions about the
> internal representation of a Perl scalar unless I handle all the
> manipulations in my own XS code. That is the Interop bug, so to speak,
> which has been around far longer than the Unicode bug.
The matter is very simple: Perl strings come in two representations.
Both representations mean the same thing, and as you are using them to
represent bytes, not characters, they can always be translated into each
• Encode inputs to `pack` if your own interfaces accepts characters but
needs to yield or pass bytes. Document whether your own interfaces
accept bytes or characters or take a flag that specifies which of the
two, for each input string, and never try to guess what a string is by
looking at it.
• Downgrade inputs in your XS code when you need a byte array.
Then the representation is guaranteed, and it is also correct within the
Perl string model.
All other approaches are incorrect.
Aristotle Pagaltzis // <http://plasmasturm.org/>