develooper Front page | perl.perl5.porters | Postings from December 2009

Patch [perl #70940] \X and starter-starter combinations

Thread Next
karl williamson
December 5, 2009 21:32
Patch [perl #70940] \X and starter-starter combinations
Message ID:
This patch is available at git://
branch x

It expands the definition of \X to more closely have its intended 
effect: to match a logical Unicode character.  It will now match the 
Unicode "extended grapheme cluster", as previously discussed on this list.

This entailed changing the algorithm for matching \X in regexec.c, with
additional swashes needed for the various Unicode matching tables
required, so utf8.c was changed to include those.  (I'm taking advantage
of this patch to add some commented-out proof-of-concept code to it that
I forgot to put in the mktables revamp patch.  Also, I've cleaned up
some comments in utf8.h)

The new \X passes the extensive Unicode test suite.  I have modified 
mktables to add those tests to the test script it generates.  I also 
extended mktables to be able to more generally handle Unicode test 
suites for future expansion.

I found that a comment was inappropriately not being output into a few 
of the mktables generated tables.  And I cleaned up the testing for 
non-ASCII machines.

I'm not sure about perl_clone().  I don't know how it ties in with 
things; I added the new swash variables to it, as documented in 
intrpvar.h, but I don't know how to test that I didn't do a typo in them.

Things are set up so that when Perl is used on an earlier Unicode 
release that doesn't have extended grapheme clusters, the previous 
definition of \X is used, possibly with the addition of Korean Hangul 
syllable matching, if available in that release.

If this patch is accepted, pod changes will follow, but I'm working on 
other bug fixes at the moment, which have priority.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About