develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
Marc Lehmann
March 30, 2007 05:18
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
On Fri, Mar 30, 2007 at 01:07:22PM +0100, Nicholas Clark <> wrote:
> > And the problem is that those bugs are not considered bugs but features.
> I certainly consider this one a bug.

So fix it. It is easy to do, and I documented it years ago (during 5.6).

> I didn't create the release that messed this up, and didn't realise the
> implications of the change until some time after it happened.
> You might consider me slow for this.

I do not consider you slow for not creating the release that messed this
up, no :) If it all, its a pity you didn't.

> > I wonder why it is ok to break large amounts of perl and xs code silently,
> > without even documenting how to fix it[1], while at the same time 5.10
> > introduced "use feature" to shield against possible breakage with far less of
> > an impact then the changes above.
> Problem is now that I can't see how to fix it without breaking other code that
> plays by the different, new, rules.

I have yet to see that other code outside the testsuite that reliably
relies on broken 5.6 unicoded semantics and is considered worth keeping. I
challenge you to show me, and I promise to show you another example from
CPAN or elsewhere that breaks. OR maybe even two. Or three.

Besides, without any doubt, the code that relies on psuedo-random
behaviour is certainkly in the minority. The amount of code in the wild
that relies on "C" having 5.5 semantics is much larger. I doubt _anybody_
except me (or at leats not very many people) understands that he has to
downgrade scalars before passing them into unpack to decode structures.

As the amount of breakage will only increase over time as unicode becomes
more and more used in perl.

The solution to this bug is fixing. The earlier, the better.

At the very least, it needs to be documented, and *hard* rules on when
perl uphgraded or downgrades would need to be established, as, right now,
behaviour is pretty random over versions. Of course, down that path lies
madness and perl5 will ever stay a failed experiment of how to do unicode
correctly (namely, abstracted away from the actual encoding).

Besides, don't you think an agrument of the form "yes, it breaks lots of
code, but some code might rely on it, so lets keep it" sounds pretty,
sorry to be so honest, stupid?

                The choice of a
      -----==-     _GNU_
      ----==-- _       generation     Marc Lehmann
      ---==---(_)__  __ ____  __
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\      XX11-RIPE

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About