develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
Marc Lehmann
March 30, 2007 23:12
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
On Sat, Mar 31, 2007 at 03:15:50AM +0200, Juerd Waalboer <> wrote:
> Marc Lehmann skribis 2007-03-31  2:48 (+0200):
> > > A koi8r string is a byte string. If you keep it separated from text
> > Your definiton is completely useless in the real world. Obviously, a KOI8-R
> > string is a text string. It contains text characters. End of story.
> This is a logical thing to say, but unfortunately not very useful.

Thanks, I'll take logical over subjective opinions any day.

> The distinction between a text string, and a byte string representing
> text, is actually useful.

It is useful, but making it the mandatory is stupid, because you lose the
ability to handle real-world situations, for example JSON, which simply does
not make the distinction. Ther same is true for Pelr, which also does not
make the distinction.

> > You also have very weird ideas of what programmers should and should
> > not do the defy reality.
> Weird ideas, maybe, but at least weird ideas that help dozens of people
> write working and maintainable code.

Likely, but its still your personal opinion, your personal coding style.
Forcing that on everybody else by calling everything that doesn't fit
(such as JSON) "broken" does not convince _me_ that it is a good coding

> You don't believe in my weird ideas, fine. But I find it very
> interesting that you run into all these problems with Perl's unicode
> support, while the people who stick to my weird ideas write lots of code
> without that.

Goddamnit, I more than once told you that I am not running into those
problems because I know most perl bugs regarding unicode inside and out. I am
doing unicode programming for far longer than Perl easily supports it, and I
would be grateful if you would stop bullshitting me and spreading lies.

I *explicitly* said that it is other users who hit problems, and that I
can cope with them quite well.

> > I find all that contradictory, but as you ignore the evidence I
> > presented and the question I asked you (JSON::XS example), I see no
> > point in continuing talking to you.
> Unfortunately, I understand very little of the JSON example. I don't
> know JSON and would have to learn about it first.

Well, its one of that reality things where your coding style blankly breaks
down: JSON makes no difference between binary and text, except that binary
only uses character indices 0..255. You do not know wether a json string is
binary or text. Usage decides.

One such usage is unpack, and I find it weird that I have to use "U" to get
binary semantics in unpack. Or you have to downgrade explicitly.

Anyways, that clashes with your notion that the programmer made a bug when
binary data happens to be UTF-X encoded internally. Reality hits, you
lose, simply because calling usage of JSON broken according to your coding
standards will not have any effect on JSON.

And the way JSON handles binary is extremely common in the real world. And
it is exactly how perl handles it, modulo bugs and, well, unpack (and the
unfortunate decision to give old XS code sometimes bytes encoded in UTF-X,
sometimes not).

Perl simply does _not_ work like you want it to. Instead, it is much simpler
because in the majority of cases it just works without having to track wether
my binary string came in contact with something that upgraded it. I simply do
not have to care in Perl, except for the cases above.

And thats the good thing. Teaching people to avoid upgrading by your text vs.
binary string technique is confusing. It is backwards. People should not have
the need to be concerned about upgrading, because it is an internal thing.

And yes, I said I would not answer you, but what prompted it was your
continuous abusive behaviour of putting words into my mouth I have
*explicitly* said to not have said, and explaine dit in detail.

                The choice of a
      -----==-     _GNU_
      ----==-- _       generation     Marc Lehmann
      ---==---(_)__  __ ____  __
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\      XX11-RIPE

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About