On 01/27/2012 12:05 PM, David Golden wrote: > On Fri, Jan 27, 2012 at 1:40 PM, Karl Williamson > <public@khwilliamson.com> wrote: >> Unicode is about to release their latest version. In the past, we have >> automatically included the latest Unicode version in the next Perl one; but >> I'm checking to see if there is any disagreement here. > > We're less than 4 weeks away from "user-visible changes" freeze. > > Do you think it's feasible to have all changes incorporated and tested > before then? All but the final tweaks have been written; those being the things I was waiting to see the final files for. Most of the changes needed have been in blead for about a month or so. So yes, it is feasible. > > What's your take on the cost/benefit ratio? Does it fix > problems/ambiguities we're currently dealing with? Does it introduce > any new ones? > Perl is not going to stay at Unicode 6.0 forever; we will upgrade to a future version at some point. To me, the question then becomes, "Is there a reason to wait?" The only reason I can think of is if 6.1 has sufficiently severe enough bugs that are later corrected, either by a formal Corrigenda, or in a future release; in the latter case we would want to skip 6.1 entirely. There was a Corrigendum in 6.0 which came out a little over a month after its release. It involved a typo in the data in which the Bidirectional algorithm class for U+070F SYRIAC ABBREVIATION MARK was incorrect. All the Corrigenda so far have been at that level of seriousness, and that was the 8th since Unicode's inception in 20+ years. Given that track record, I don't see a reason to hold up 6.1 just in case. The 6.1 data has been available for testing for some months. I tested the part that the Perl core cares about, and found some bugs resulting from typos, which they corrected. The data changes that we care about are listed in: http://www.unicode.org/versions/Unicode6.1.0/#Database_Changes There were two General Category changes that affected our tests. The category of the SECTION SIGN and the PILCROW SIGN was changed from them being symbols to being punctuation. Our tests were making sure that they were what they have previously been. There may be code out there that depends on the old values, but it will have to change sooner or later, as Unicode is unlikely to change back. I don't see other changes being as significant as that. It does address the BELL fiasco officially, to prevent something like that from happening again. But it doesn't affect our workaround for that, so this change has no immediate practical effect. It also fixes some other issues; none of which the Perl core cares about, but which may affect Perl programs that process various languages such as Hebrew, Japanese, and Thai. The claim is that these changes are for the better.Thread Previous | Thread Next