develooper Front page | perl.perl5.porters | Postings from November 2010

[perl #78354] PATCH: Use Unicode 6.0

Thread Previous
From:
Father Chrysostomos via RT
Date:
November 18, 2010 13:12
Subject:
[perl #78354] PATCH: Use Unicode 6.0
Message ID:
rt-3.6.HEAD-13564-1290114717-978.78354-15-0@perl.org
On Tue Nov 16 17:44:57 2010, public@khwilliamson.com wrote:
> karl williamson wrote:
> > Father Chrysostomos via RT wrote:
> >> On Tue Oct 12 21:56:17 2010, public@khwilliamson.com wrote:
> >>> This series of commits delivers the Unicode 6.0 db, and upgrades Perl 
> >>> to use it.  There may still be some work to do in Unicode::UCD to 
> >>> support the new characters (which I'll investigate), but the rest of 
> >>> the Perl core should fully support it.
> >>>
> >>> The few code changes are attached to this email, but the bulk of the 
> >>> changes (along with the attachments here), too large to email, are 
> >>> located at git://github.com/khwilliamson/perl.git
> >>> branch mktables
> >>>
> >>> Those changes are essentially entirely official Unicode data, except 
> >>> for the MANIFEST, perldelta, version, and a couple data changes in
UCD.t
> >>
> >> I’ve applied the first patch as 92f9d56c66.
> >> With the Unicode 6 database I get a test failure:
> >>
> >> $ curl http://github.com/khwilliamson/perl/commit/35e84e1c3151243.patch
> >> | git am
> >> [...]
> >> $ cd t
> >> $ ./perl harness -v ../lib/charnames.t
> >> [...]
> >> not ok 17078 - Verify string_vianame("BELL") is chr(0x1F514)
> >> # Failed at ../lib/charnames.t line 105
> >> #      got "\a"
> >> # expected "\x{1f514}"
> >>
> >>
> >>
> > 
> > I'm afraid this is what I consider to be a flaw in the new standard, 
> > though they wouldn't; I regret that I did not find it before it was too 
> > late; as your tests are the first it surfaced.  I'm not sure Unicode 
> > would have listened to me anyway, but we would have known about this 
> > earlier.
> > 
> > Your tests showed the problem and my tests didn't, because of the
random 
> > sampling of the tests, because it would take too long to go through all 
> > million possible code points each time; and my tests just didn't try 
> > that combination yet.
> > 
> > I'm not sure what to do; suggestions welcome.
> > 
> > The problem stems from the fact that the Standard does not give
names to 
> > the control characters, such as ACK and BEL.  It did in version 1.0,
and 
> > it still publishes those names as the "Unicode_1_Name" property.  That 
> > name for character 0x07, known by the acronym BEL, is "BELL".  What
Perl 
> > does is to use the Unicode 1 names when there is no current.  All was 
> > fine until 6.0 came along and re-used BELL for a different character.
> > 
> > But as far as Unicode is concerned, there isn't a problem, as BEL
has no 
> > official name.  It is Perl who has persisted in using this old name.  I 
> > don't know why Unicode removed the names; and it seems eminently 
> > reasonable to give them names; but here we are.
> > 
> > The only option I can think of that doesn't violate our stability 
> > policies is to, in 5.14, keep the old BELL meaning, but deprecate it, 
> > saying to use BEL instead, which was added in 5.13 as a synonym for it. 
> >  This means that in 5.14 we don't accept that one new Unicode
character, 
> > except by ordinal value.  In 5.16, we convert to use Unicode.
> > 
> > In the meantime, I will propose that Unicode adopt a policy of not
doing 
> > this again, and perhaps an alias that gives a somewhat different name, 
> > just to clear up future confusion.
> > 
> 
> 
> The attached patches work around this problem by deprecating \N{BELL} 
> for 5.14, and giving the new name \N{ALERT} to it.  The new character 
> with that name will be unnamed.  This means that Perl 5.14 doesn't quite 
> support Unicode 6.0.
> 
> The patches are also available at:
> git://github.com/khwilliamson/perl.git
> branch uni6
> 
> which includes the entire series of unicode 6 patches.

Thank you. All applied.

(Why did you not use ALARM?)



Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About