develooper Front page | perl.perl6.users | Postings from September 2020

Re: "ICU - International Components for Unicode"

Thread Previous | Thread Next
From:
Patrick R. Michaud
Date:
September 25, 2020 18:15
Subject:
Re: "ICU - International Components for Unicode"
Message ID:
20200925181555.GC77201@pmichaud.com
On Fri, Sep 25, 2020 at 12:37:49PM +0200, Elizabeth Mattijsen wrote:
> > On 25 Sep 2020, at 04:25, Brad Gilbert <b2gills@gmail.com> wrote:
> > Rakudo does not use ICU
> > 
> > It used to though.
> > 
> > Rakudo used to run on Parrot.
> > Parrot used ICU for its Unicode features.
> 
> I do remember that in the Parrot days, any non-ASCII character in 
> any string, would have a significant negative effect on grammar parsing.  
> This was usually not that visible when trying to run a script, but the 
> time needed to compile the core setting (which already took a few minutes 
> then) rose (probably exponentially) to: well, I don't know.  

Part of this is because Parrot/ICU was using UTF-8 and/or UTF-16 to
encode non-ASCII strings.  As a result, indexing into a string often 
became a O(n) operation instead of O(1).  For short strings, no problem,
for long strings (such as the core setting) it was really painful.

We did work on some ways in Parrot/NQP to reduce the amount of string
scanning involved, such as caching certain index-points in the string, 
but it was always a bit of a hack.  Switching to a fixed-width encoding
(NFG, which MoarVM implements) was definitely the correct path to take
there.

Pm

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About