develooper Front page | perl.perl5.porters | Postings from January 2012

Re: Any objections to using Unicode 6.1 in perl v5.16

Thread Previous | Thread Next
Karl Williamson
January 28, 2012 13:04
Re: Any objections to using Unicode 6.1 in perl v5.16
Message ID:
On 01/27/2012 12:05 PM, David Golden wrote:
> On Fri, Jan 27, 2012 at 1:40 PM, Karl Williamson
> <>  wrote:
>> Unicode is about to release their latest version.  In the past, we have
>> automatically included the latest Unicode version in the next Perl one; but
>> I'm checking to see if there is any disagreement here.
> We're less than 4 weeks away from "user-visible changes" freeze.
> Do you think it's feasible to have all changes incorporated and tested
> before then?

All but the final tweaks have been written; those being the things I was 
waiting to see the final files for.  Most of the changes needed have 
been in blead for about a month or so.  So yes, it is feasible.
> What's your take on the cost/benefit ratio?  Does it fix
> problems/ambiguities we're currently dealing with?  Does it introduce
> any new ones?

Perl is not going to stay at Unicode 6.0 forever; we will upgrade to a 
future version at some point.  To me, the question then becomes, "Is 
there a reason to wait?"  The only reason I can think of is if 6.1 has 
sufficiently severe enough bugs that are later corrected, either by a 
formal Corrigenda, or in a future release; in the latter case we would 
want to skip 6.1 entirely.

There was a Corrigendum in 6.0 which came out a little over a month 
after its release.  It involved a typo in the data in which the 
Bidirectional algorithm class for U+070F SYRIAC ABBREVIATION MARK was 
incorrect.  All the Corrigenda so far have been at that level of 
seriousness, and that was the 8th since Unicode's inception in 20+ 
years.  Given that track record, I don't see a reason to hold up 6.1 
just in case.

The 6.1 data has been available for testing for some months.  I tested 
the part that the Perl core cares about, and found some bugs resulting 
from typos, which they corrected.

The data changes that we care about are listed in:

There were two General Category changes that affected our tests.  The 
category of the SECTION SIGN and the PILCROW SIGN was changed from them 
being symbols to being punctuation.  Our tests were making sure that 
they were what they have previously been.  There may be code out there 
that depends on the old values, but it will have to change sooner or 
later, as Unicode is unlikely to change back.  I don't see other changes 
being as significant as that.

It does address the BELL fiasco officially, to prevent something like 
that from happening again.  But it doesn't affect our workaround for 
that, so this change has no immediate practical effect.

It also fixes some other issues; none of which the Perl core cares 
about, but which may affect Perl programs that process various 
languages such as Hebrew, Japanese, and Thai.  The claim is that these 
changes are for the better.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About