develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
Aristotle Pagaltzis
January 14, 2012 08:01
Re: pack and ASCII
Message ID:
* David Mertens <> [2012-01-12 13:10]:
> >>> Well, I suppose it would help when interacting with buggy modules
> >>> (modules mistakenly using SvPV without checking SvUTF8 instead of
> >>> using SvPVbyte), but we shouldn't break useful functions to cater
> >>> to broken modules.
> >>
> >> Aside from being offensive
> >
> > huh? What part? My suggestion that it helps code that suffers from
> > The Unicode Bug, or my opinion that we shouldn't break C<pack>?
> I never check SvUTF8 in my XS code. You think that suffers from the
> Unicode bug. Never mind that I pack floating point arrays for
> numerical manipulation.

The UTF8 flag does not tell you whether a string is characters or not.
It tells you whether its PV buffer is a packed octet array (UTF8=0) or
a variable-width integer sequence (UTF8=1).

If you are disregarding this information, your code is broken.

I am sorry if the level on which you evaluate correctness arguments
about your code is as judgements on your personality, since the two
have nothing to do with each other.

> Perl has the T_OPAQUPTR type map in which Perl allocates and manages
> the memory for me, but its internal structure is up to the XS
> programmer. A typical use case would be to allocate enough memory to
> hold a specified struct. I would use pack and unpack to manage that
> data Perl-side, and use a C struct to get at the data C-side.
> Therefore, I want guarantees about how Perl represents the data
> internally when I use pack, which I thought was the whole point of
> pack. You seem to argue that I can make no assumptions about the
> internal representation of a Perl scalar unless I handle all the
> manipulations in my own XS code. That is the Interop bug, so to speak,
> which has been around far longer than the Unicode bug.

The matter is very simple: Perl strings come in two representations.
Both representations mean the same thing, and as you are using them to
represent bytes, not characters, they can always be translated into each

• Encode inputs to `pack` if your own interfaces accepts characters but
  needs to yield or pass bytes. Document whether your own interfaces
  accept bytes or characters or take a flag that specifies which of the
  two, for each input string, and never try to guess what a string is by
  looking at it.

• Downgrade inputs in your XS code when you need a byte array.

Then the representation is guaranteed, and it is also correct within the
Perl string model.

All other approaches are incorrect.

Aristotle Pagaltzis // <>

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About