develooper Front page | perl.perl5.porters | Postings from January 2005

Re: [perl #33734] unpack fails on utf-8 strings

Thread Previous | Thread Next
January 11, 2005 09:23
Re: [perl #33734] unpack fails on utf-8 strings
Message ID:
On Mon, Jan 10, 2005 at 01:42:14PM -0000, Nicholas Clark via RT <> wrote:
> On Sun, Jan 09, 2005 at 10:19:43PM -0000, Marc Lehmann wrote:
>     /* Packed side is assumed to be octets - so force downgrade if it
>        has been UTF-8 encoded by accident

Yupp - pack makes octets out of data, unpack makes data out of octets.

>      */
>     register char *s = SvPVbyte(right, rlen);
> #else
>     register char *s = SvPV(right, rlen);
> #endif
> and the default is the #else clause. If I recompile with -DPACKED_IS_OCTETS
>   Failed 40 test scripts out of 903, 95.57% okay.


> which doesn't look great. It looks like some cases in unpack expect to find
> utf8 data in the source string. Great. :-(

Do you know which cases? I'd say (Without knowing them :) that they are

I cannot find any conversion operator that would make sense when feed with
non-octect-data (in the perlfunc manpage, except maybe "U", but even "U"
should work on octets, not on an utf-8 string, i.e. it should generate two
characters for \x80, not one).

> I wonder if it's viable to make the integer conversion operators (and the
> floating point operators) downgrade just enough characters to be useful?

That would still break "b" and would have questionable semantics on "a"
for example.

I frankly cannot see any reason why >255 characters can make any sense as
argument to unpack, and if the testsuite fails, I guess that is then a bug in
the testsuite.

                The choice of a
      -----==-     _GNU_
      ----==-- _       generation     Marc Lehmann
      ---==---(_)__  __ ____  __
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\      XX11-RIPE

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About