develooper Front page | perl.perl5.porters | Postings from April 2007

pack/unpack feature suggestion (was: Re: perl, the data, and the utf8 flag)

Thread Previous | Thread Next
From:
Juerd Waalboer
Date:
April 3, 2007 05:54
Subject:
pack/unpack feature suggestion (was: Re: perl, the data, and the utf8 flag)
Message ID:
20070403125441.GQ31277@c4.convolution.nl
Glenn Linderman skribis 2007-04-01 16:34 (-0700):
> Aha!  OK, this is a way that unpack could successfully operate on a 
> multi-bytes buffer.  But I think it is also equivalent to downgrading it 
> (with a warning for values > 255) and then processing it as bytes.  

Not if you also have the "U" in the template somewhere, in addition to
other letters. (Bad idea anyway!)

> I think that pack-U should be defined to produce "encoded bytes"

It doesn't do that, though. It produces encodingless characters, not
bytes. However, you inspired me to come up with the following:

    $byte_string =   pack "a*[UTF-8]", $text_string
    $text_string = unpack "a*[UTF-8]", $byte_string

Likewise for "A" and "Z", and for arbitrary encodings. This would just
call Encode::encode (for pack) or Encode::decode (for unpack)
transparently, before doing the actual packing or unpacking.

The quantifier is a number of bytes, not characters. This means that it
can be in the middle of a multibyte encoding for a character. When that
happens, tough luck. We can't help that. (In other words: this really
only makes a lot of sense for multibyte packing if the quantifier is *)
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  <juerd@juerd.nl>  <http://juerd.nl/sig>
  convolution:     ict solutions and consultancy <sales@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About