develooper Front page | perl.perl5.porters | Postings from December 2010

PATCH: [perl #58182] The "Unicode Bug"

Thread Next
From:
karl williamson
Date:
December 1, 2010 16:15
Subject:
PATCH: [perl #58182] The "Unicode Bug"
Message ID:
4CF6E502.10601@khwilliamson.com
This series of commits, along with many previous ones, resolves [perl 
#58182], the "Unicode Bug".  Other tickets have also been fixed by these 
series; I plan to address those after the 5.14 code freeze.  These 
commits are also available at:
https://github.com/khwilliamson/perl.git
branch folding

This series of commits extend Unicode semantics to backreferences of 
capture buffers, the last remaining significant missing area.  Earlier I 
had also thought tries weren't covered, but it turns out that because of 
the Unicode bug, the code is disabled that would be affected.  I plan to 
see about reenabling it at some point, but since it is a matter of 
efficiency and not correctness, it has lower immediate priority for me.

As explained in the perldelta, there are still two known minor areas 
where the behavior varies depending on the utf8ness of the underlying 
string: 1) user-defined upper/lower/title casing which is planned to be 
deprecated in 5.14; and 2) the German sharp s character has a somewhat 
different set of bugs matching under /i when in utf8 versus not.  I plan 
to continue working on both sets.  But all such characters that have 
multi-char folds are buggy.

And there is still work to be done.  This patch includes changes to a 
number of pods, but several should really be re-written with a new take, 
given this new functionality.  I'm hoping a better word smith than me 
will jump in.  There is some clean-up needed and other bugs that I've 
found in reading the code, which I'll shortly get to.

This patch is the culmination of my efforts that got me to come out of 
retirement and join p5p more than two years ago.  Since this is a 
significant milestone for me, the rest of this post is a pause for 
personal reflection.  I decided to get involved to fix this because it 
was hindering some code I was writing, and I was actually kind of 
embarrassed that Perl didn't do it right, unlike most of my experiences 
with it.  I thought at the time that it would take a few weeks at most. 
  I've learned a lot about Perl; and I've regained some facility with C, 
and found things out about it that I never knew.  For example, I never 
had to worry very much about C portability in my career (although I had 
been an expert on Fortran 66 portability at one time).  One thing I've 
noticed is that doing this work has slowed the deterioration of my 
intellectual capabilities, and I imagine may help me live longer.  So I 
recommend doing something like this to stave off dotage.  When I first 
got a job out of grad-school designing and programming, I sometimes was 
in awe that I got paid (well) to do something I loved.  I still love 
doing this kind of work, and, like almost all of you, I'm not getting 
paid. :)

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About