develooper Front page | perl.perl6.users | Postings from September 2020

Re: "ICU - International Components for Unicode"

Thread Previous | Thread Next
William Michels via perl6-users
September 29, 2020 19:14
Re: "ICU - International Components for Unicode"
Message ID:
Thank you, Samantha!

An outstanding question is one posed by Joseph Brenner--that
is--knowing which version of the Unicode standard is supported by
Raku. I grepped through two files, one called "unicode.c" and the
other called "unicode_db.c". They're both located in rakudo at:
/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .

Below are the first 4 lines of my grep results. As you can see
(above/below), rakudo-2020.06 supports Unicode12.1.0:

~$ raku -ne '.say if .grep(/unicode/)'
# For terms of use, see
# The UAXes can be accessed at
From on 2017-11-28:
Distributed under the Terms of Use in

It would be really interesting to follow your Unicode work, Samantha.
The ideas you propose are interesting and everyone hopes for speed
improvements. Is there any place Raku-uns can go to read
updates--maybe a grant report, blog, or Github issue? Or maybe right
here, on the Perl6-Users mailing list? Thanks in advance.

Best, Bill.

W. Michels, Ph.D.

On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey <> wrote:
> So MoarVM uses its own database of the UCD. One nice thing is this can
> probably be faster than calling to the ICU to look up information of each
> codepoint in a long string. Secondly it implements its own text data
> structures, so the nice features of the UCD to do that would be difficult to
> use.
> In my opinion, it could make sense to use ICU for things like localized
> collation (sorting). It also could make sense to use ICU for unicode
> properties lookup for properties that don't have to do with grapheme
> segmentation or casing. This would be a lot of work but if something like this
> were implemented it would probably happen in the context of a larger
> rethinking of how we use unicode. Though everything is complicated by that we
> support lots of complicated regular expressions on different unicode
> properties. I guess first I'd start by benchmarking the speed of ICU and
> comparing to the current implementation.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About