Front page | perl.perl5.porters |
Postings from September 2000
Re: unicode support and perl
From: Simon Cozens
September 15, 2000 10:16
Re: unicode support and perl
Message ID: 20000915181532.C6691@deep-dark-truthful-mirror.perlhacker.org
On Fri, Sep 15, 2000 at 05:40:12PM +0200, Marc Lehmann wrote:
> (including the preliminary patches I sent, to one of which you even replied
Yes, there are things you patched. I would have thought that if you'd patched
them to your satisfaction you wouldn't be considering them a problem. If
you're still unhappy with them, I'll take another look at what you did. But I
thought that you had no problems with them, and Jarkko had no problems with
them, and they went in core, so...
When I wrote that email, I only had access to my p5p mailbox; if I could have
got to the perlbugtron, I'd have done so, and yes, I'd have found the rest of
your report. This is why I was asking for test cases or bug IDs, rather than
"Oh, there's stuff out there".
But I said I was looking through my mailbox, and that's what I found. In about
two hours, I'll be able to talk to the perlbugtron and I'll have a look at the
rest of the stuff. I have no objection to hunting down UTF8 bugs all weekend
as this is, unfortunately, what I do for fun. :)
> <ironic>Well... the substitution operator and string concatenation still
> produce garbage with utf8-strings
Uhm, odd. I've not seen any patches to string concatenation in the past week,
and I've had that part of the torture test running fine here. Anyway, I'll
look at your bug reports and see what I find. I'm not daring to claim that the
torture test is exhaustive, although it does try hard to be.
Substitution *is* broken and has been broken for a couple of months. I was
under the impression that Hugo and MJD were looking into making the regexp
engine more UTF8-happy, since there are a couple of fundamental *ahem* issues
with it. I don't know if they've had time, but I tend to avoid the regexp
engine like the plague and try and work on things that a mere mortal can
I know that complex cases of the tr/// operator are broken, too, because I
stopped working on it when I went to work on line disciplines ages ago. I gave
a summary of what was wrong with it and how to fix it, I wrote documentation
on how to write Unicode-friendly stuff in core, and I asked for volunteers to
help out with fixing Unicode problems, and do you know what happened? Do you
know why it's still broken? Not a single person came forward. No one cared.
Nobody gave a damn. 
This doesn't help matters. Other people (and I am excepting you because you
*do* file bug reports and patches) whining that "Unicode support is broken"
doesn't help matters. Test cases help matters. Patches help matters. Cloning
Simon would help matters, but test cases and patches are much easier to
I'm trying to fix stuff as I see it break, and I think that's probably the
best I can do. I've half a mind to swear loudly and declare that if you care
so much, you fix it. But I'm not going to do that.  I want to see working
UTF8 support more than anyone, and I'll do all I can to get it working, time
and personal life permitting.
 Mike, Hugo, Nick (I think), yourself and a couple of other people put in a
bunch of patches for some other Unicode bugs, but I don't think anyone tried
working on tr///.
 (I expect loud gasps of shock from anyone who knows me. :)
"Which you then convert to gold, non-perishable food, firearms, good liquor &
a secluded hideaway in the last of the internet official protocol standards"
-- Megahal (trained on asr), 1998-11-05