develooper Front page | perl.perl5.porters | Postings from April 2007

Re: pack/unpack feature suggestion

Thread Previous | Thread Next
From:
Glenn Linderman
Date:
April 3, 2007 16:05
Subject:
Re: pack/unpack feature suggestion
Message ID:
4612DBAE.8040309@NevCal.com


On approximately 4/3/2007 5:54 AM, came the following characters from 
the keyboard of Juerd Waalboer:
> Glenn Linderman skribis 2007-04-01 16:34 (-0700):
>   
>> Aha!  OK, this is a way that unpack could successfully operate on a 
>> multi-bytes buffer.  But I think it is also equivalent to downgrading it 
>> (with a warning for values > 255) and then processing it as bytes.  
>>     
>
> Not if you also have the "U" in the template somewhere, in addition to
> other letters. (Bad idea anyway!)
>   

If the U template were adjusted to not pack into a "multi-bytes" buffer, 
but instead pack into an encoded UTF8 representation of its parameter it 
a bytes buffer, then all would be well.  And unpack U decode, byte by 
byte, from an encoded representation back to an INT, it would retrieve 
the value, even if the buffer had been upgraded, using the blead-unpack 
scheme.


>> I think that pack-U should be defined to produce "encoded bytes"
>>     
>
> It doesn't do that, though. 

Right... So I'd consider that a bug.  If changing it is impossible, I'd 
recommend deprecating U, in favor of M, which does as I describe... 
encodes its parameter to Multi-byte variable-length UTF8... in a bytes 
buffer.


> It produces encodingless characters, not
> bytes. However, you inspired me to come up with the following:
>
>     $byte_string =   pack "a*[UTF-8]", $text_string
>     $text_string = unpack "a*[UTF-8]", $byte_string
>   

So I'm not sure what [UTF-8] means, there, but I guess it is a new 
syntax for modifying the "a" template to do encoding/decoding.  Similar 
to what I'm suggesting with M.  For M, the count parameter would be in 
characters.

Are you proposing allowing other encodings?  I'm only proposing allowing 
utf8... anything else, and the user can call encode/decode himself, and 
deal with the byte lengths that result himself.  But since Perl supports 
two binary encodings, having pack support them seems reasonable.

> Likewise for "A" and "Z", and for arbitrary encodings. This would just
> call Encode::encode (for pack) or Encode::decode (for unpack)
> transparently, before doing the actual packing or unpacking.
>   

Ah, I see... to make it more explicit, to call encode on the parameter, 
then do a pack a using the result of encode instead of the original 
parameters.  And for unpack, do the unpack, and then call decode on the 
result of the unpack, placing that result in the parameter.

> The quantifier is a number of bytes, not characters. This means that it
> can be in the middle of a multibyte encoding for a character. When that
> happens, tough luck. We can't help that. (In other words: this really
> only makes a lot of sense for multibyte packing if the quantifier is *)
>   

The M template code I propose would take a number of characters as a 
parameter, and would work for given character lengths in either 
direction.  If U cannot be changed to be what I describe for M, it 
should be deprecated.


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About