develooper Front page | perl.perl5.porters | Postings from January 2012

Re: pack and ASCII

Thread Previous | Thread Next
January 13, 2012 05:30
Re: pack and ASCII
Message ID:
On 01/12/12 11:40, Eric Brine wrote:
> On Thu, Jan 12, 2012 at 8:58 AM, John P. Linderman (jpl) 
> < <>> wrote:
>     I am *not* proposing that the behavior of "A" be changed.  Too
>     much code would break.  However, the list of "surprises" that
>     might happen when ASCII text is replaced with general unicode text
>     should be mentioned.
> Yes, I realise you are only asking for a documentation changes (or a 
> new letter for new behaviour). Others are advocating changes to the 
> existing behaviour, though. My comments are directed at them. I'm 
> sorry you got caught in the middle.
>     2) I have many applications that write records of fixed length
>     (measured in octets).  Files of such records can easily be
>     searched with binary search, and it is trivial to read the Nth
>     record.  If this is a fringe requirement, there's not a lot left
>     to say.  But I suspect I am not alone in finding this a convenient
>     format.
> I fully agree you should be able to do this.
>     6) The C<<$reclen = length(pack($format))>> metaphor is just a
>     lower limit on record lengths.
> Only if you both forgot to encode your text and peek at Perl 
> internals. (C<print> does the latter, but will warn when it does so.)
>     7) C<<print $fh $pack-output>> may grouse about wide characters (I
>     regard this as a feature, but it can nevertheless be a surprise).
> Excellent, so Perl did report the error to you. Add encode() before 
> pack(), and you're good to go.
> - Eric
To quote perldoc Encode, which, in turn, is quoting "Programming Perl, 
3rd ed.",

   Goal #2: Old byte-oriented programs should magically start working on 
the new character-oriented data when appropriate.

Some of the "magic" is gone if it is necessary to explicitly encode 
before packing and decode after unpacking.  It's too late to have "A20" 
do what I meant, but we can make it relatively painless if there is a 
"pack pragma" (or something similar) that turns on that behavior without 
having to (otherwise) modify programs.  "That behavior", just to be 
clear, means interpreting the number following "A" (or "a") as the 
number of octets that will be stored, with "pack" utf-encoding the data 
prior to padding, and "unpack" utf-decoding the octets after stripping 
off the padding.  (What to do if it is necessary to truncate the encoded 
octets needs thought.  Truncate at the previous character boundary and 
pad?  Warn?)

Although it is now irrelevant, I'm having trouble thinking of where the 
current behavior is useful.  Why would one want to pad to a specified 
character (not octet) length?  -- jpl

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About