Front page | perl.perl5.porters |
Postings from December 2010
PATCH: [perl #58182] The "Unicode Bug"
Thread Next
From:
karl williamson
Date:
December 1, 2010 16:15
Subject:
PATCH: [perl #58182] The "Unicode Bug"
Message ID:
4CF6E502.10601@khwilliamson.com
This series of commits, along with many previous ones, resolves [perl
#58182], the "Unicode Bug". Other tickets have also been fixed by these
series; I plan to address those after the 5.14 code freeze. These
commits are also available at:
https://github.com/khwilliamson/perl.git
branch folding
This series of commits extend Unicode semantics to backreferences of
capture buffers, the last remaining significant missing area. Earlier I
had also thought tries weren't covered, but it turns out that because of
the Unicode bug, the code is disabled that would be affected. I plan to
see about reenabling it at some point, but since it is a matter of
efficiency and not correctness, it has lower immediate priority for me.
As explained in the perldelta, there are still two known minor areas
where the behavior varies depending on the utf8ness of the underlying
string: 1) user-defined upper/lower/title casing which is planned to be
deprecated in 5.14; and 2) the German sharp s character has a somewhat
different set of bugs matching under /i when in utf8 versus not. I plan
to continue working on both sets. But all such characters that have
multi-char folds are buggy.
And there is still work to be done. This patch includes changes to a
number of pods, but several should really be re-written with a new take,
given this new functionality. I'm hoping a better word smith than me
will jump in. There is some clean-up needed and other bugs that I've
found in reading the code, which I'll shortly get to.
This patch is the culmination of my efforts that got me to come out of
retirement and join p5p more than two years ago. Since this is a
significant milestone for me, the rest of this post is a pause for
personal reflection. I decided to get involved to fix this because it
was hindering some code I was writing, and I was actually kind of
embarrassed that Perl didn't do it right, unlike most of my experiences
with it. I thought at the time that it would take a few weeks at most.
I've learned a lot about Perl; and I've regained some facility with C,
and found things out about it that I never knew. For example, I never
had to worry very much about C portability in my career (although I had
been an expert on Fortran 66 portability at one time). One thing I've
noticed is that doing this work has slowed the deterioration of my
intellectual capabilities, and I imagine may help me live longer. So I
recommend doing something like this to stave off dotage. When I first
got a job out of grad-school designing and programming, I sometimes was
in awe that I got paid (well) to do something I loved. I still love
doing this kind of work, and, like almost all of you, I'm not getting
paid. :)
Thread Next
-
PATCH: [perl #58182] The "Unicode Bug"
by karl williamson