develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
From:
jpl
Date:
January 11, 2012 05:13
Subject:
Re: pack and ASCII
Message ID:
4F0D8B04.4010400@research.att.com
On 01/11/12 07:10, Leon Timmermans wrote:
> On Wed, Jan 11, 2012 at 8:13 AM, Jesse Luehrs<doy@tozt.net>  wrote:
>>> If you have code that requires a UTF8=0 string specifically, it is buggy.
>>> Specifically, it suffers from the Unicode bug. You are probably using SvPV
>>> without looking at the SvUTF8. The solution is simple: Use SvPVbyte instead.
>> There has to be some point when code can assume that it has a byte
>> string. What Leon is saying is that it's a lot more useful for pack to
>> use SvPVbyte itself automatically, since pack is typically used for
>> things like binary protocols and file formats, which are usually defined
>> in terms of bytes, not characters.
> Yes, this.
>
> Leon
The byte/character/octet confusion is hurting my head, and the 
documentation isn't helping:

        pack TEMPLATE,LIST
                Takes a LIST of values and converts it into a string 
using the
                rules given by the TEMPLATE.  The resulting string is the
                concatenation of the converted values.  Typically, each
                converted value looks like its machine-level representation.
                For example, on 32-bit machines an integer may be 
represented
                by a sequence of 4 bytes, which will in Perl be 
presented as a
                string that's 4 characters long.

The result of pack on a 32-bit integer is (what I would call) 4 octets 
long, but it (IMHO) should not be called 4 characters long, if we want 
to encourage thinking in Unicode terms.

What I want pack/unpack to do is to allow me to pack a string into a 
predetermined (and presumably adequately large) number of octets as part 
of a record I will write, and recover, using unpack and the same format, 
when I read the record back in.  In that context, using SvPV or 
SvPVbyte, is out of my control, it has to be something pack and unpack 
agree to do.  The "A" format item does what I want if I stay in the 
ASCII world, but I'd like to break out.  Maybe "A" cannot be made to do 
what I requested, although I *think* what Leon is talking about would do it.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About