develooper Front page | perl.perl5.porters | Postings from January 2005

Re: [perl #33734] unpack fails on utf-8 strings

Thread Previous | Thread Next
January 11, 2005 09:22
Re: [perl #33734] unpack fails on utf-8 strings
Message ID:
On Mon, Jan 10, 2005 at 02:35:18PM -0000, Nick Ing-Simmons via RT <> wrote:
> Nicholas Clark <> writes:
> >I agree. Well, I thought I did. Then..
> I know I agree ;-)
> But I was overruled.

Reminds me of mozilla, "MSIE has the same bug, so we don't follow the RFC
here" :/

> >and the default is the #else clause. If I recompile with -DPACKED_IS_OCTETS
> I am reasonably sure that was my code, and the #ifdef backs it out
> for obscure 5.6 compatibility and/or Camel-III promissed reasons.


> >which doesn't look great. It looks like some cases in unpack expect to find
> >utf8 data in the source string. Great. :-(
> >I wonder if it's viable to make the integer conversion operators (and the
> >floating point operators) downgrade just enough characters to be useful?
> Then snag is that with 5.6 using pack/unpack and messing with SvUTF8 flag 
> by obscure means was only way to do 'encoding'.

Which shouldn't be a problem, as perl-5.6 unicode code will not work on
perl-5.8 anyways in most cases, because 5.6 was *completely* and *utterly*
broken with respect to unicode. I don't think bugs should stay around.

The problem with this bug is that, on the perl level, there is no easy way
to get it working in 5.8. 5.8 might upgrade a string for a lot of obscure
reasons (appending a string encoded in utf-8 already for other reasons,
but not containing utf-8 characters).

Not fixing this bug means that a programmer must ALWAYS downgrade the
scalar manually, or must know enough about all perl internals (and modules
in use!) involved that he can be sure that the scalar won't be upgraded

Bug-compatibility with earlier versions of perl doesn't seem like a reason to
keep it that way.

(I know you agree, but if you are still overruled, it might be useful to
argue once more: I was bitten by this bug and it took me 3 hours to find
the actual reason (and it was old code decoding a protocol, too). And I do
know _a lot_ about perl unicode internals compared to your average perl
programmer. I don't think other people would quite so quickly find out
about this).

At the very least this breakage should be documented, and a workaround
be available (I guess Encode::encode "iso-8859-1" would be working..

                The choice of a
      -----==-     _GNU_
      ----==-- _       generation     Marc Lehmann
      ---==---(_)__  __ ____  __
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\      XX11-RIPE

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About