Front page | perl.perl5.porters |
Postings from November 2010
[perl #78354] PATCH: Use Unicode 6.0
Thread Previous
From:
Father Chrysostomos via RT
Date:
November 18, 2010 13:12
Subject:
[perl #78354] PATCH: Use Unicode 6.0
Message ID:
rt-3.6.HEAD-13564-1290114717-978.78354-15-0@perl.org
On Tue Nov 16 17:44:57 2010, public@khwilliamson.com wrote:
> karl williamson wrote:
> > Father Chrysostomos via RT wrote:
> >> On Tue Oct 12 21:56:17 2010, public@khwilliamson.com wrote:
> >>> This series of commits delivers the Unicode 6.0 db, and upgrades Perl
> >>> to use it. There may still be some work to do in Unicode::UCD to
> >>> support the new characters (which I'll investigate), but the rest of
> >>> the Perl core should fully support it.
> >>>
> >>> The few code changes are attached to this email, but the bulk of the
> >>> changes (along with the attachments here), too large to email, are
> >>> located at git://github.com/khwilliamson/perl.git
> >>> branch mktables
> >>>
> >>> Those changes are essentially entirely official Unicode data, except
> >>> for the MANIFEST, perldelta, version, and a couple data changes in
UCD.t
> >>
> >> I’ve applied the first patch as 92f9d56c66.
> >> With the Unicode 6 database I get a test failure:
> >>
> >> $ curl http://github.com/khwilliamson/perl/commit/35e84e1c3151243.patch
> >> | git am
> >> [...]
> >> $ cd t
> >> $ ./perl harness -v ../lib/charnames.t
> >> [...]
> >> not ok 17078 - Verify string_vianame("BELL") is chr(0x1F514)
> >> # Failed at ../lib/charnames.t line 105
> >> # got "\a"
> >> # expected "\x{1f514}"
> >>
> >>
> >>
> >
> > I'm afraid this is what I consider to be a flaw in the new standard,
> > though they wouldn't; I regret that I did not find it before it was too
> > late; as your tests are the first it surfaced. I'm not sure Unicode
> > would have listened to me anyway, but we would have known about this
> > earlier.
> >
> > Your tests showed the problem and my tests didn't, because of the
random
> > sampling of the tests, because it would take too long to go through all
> > million possible code points each time; and my tests just didn't try
> > that combination yet.
> >
> > I'm not sure what to do; suggestions welcome.
> >
> > The problem stems from the fact that the Standard does not give
names to
> > the control characters, such as ACK and BEL. It did in version 1.0,
and
> > it still publishes those names as the "Unicode_1_Name" property. That
> > name for character 0x07, known by the acronym BEL, is "BELL". What
Perl
> > does is to use the Unicode 1 names when there is no current. All was
> > fine until 6.0 came along and re-used BELL for a different character.
> >
> > But as far as Unicode is concerned, there isn't a problem, as BEL
has no
> > official name. It is Perl who has persisted in using this old name. I
> > don't know why Unicode removed the names; and it seems eminently
> > reasonable to give them names; but here we are.
> >
> > The only option I can think of that doesn't violate our stability
> > policies is to, in 5.14, keep the old BELL meaning, but deprecate it,
> > saying to use BEL instead, which was added in 5.13 as a synonym for it.
> > This means that in 5.14 we don't accept that one new Unicode
character,
> > except by ordinal value. In 5.16, we convert to use Unicode.
> >
> > In the meantime, I will propose that Unicode adopt a policy of not
doing
> > this again, and perhaps an alias that gives a somewhat different name,
> > just to clear up future confusion.
> >
>
>
> The attached patches work around this problem by deprecating \N{BELL}
> for 5.14, and giving the new name \N{ALERT} to it. The new character
> with that name will be unnamed. This means that Perl 5.14 doesn't quite
> support Unicode 6.0.
>
> The patches are also available at:
> git://github.com/khwilliamson/perl.git
> branch uni6
>
> which includes the entire series of unicode 6 patches.
Thank you. All applied.
(Why did you not use ALARM?)
Thread Previous