develooper Front page | perl.perl5.porters | Postings from February 2000

Re: Unicode character composition

Thread Previous | Thread Next
From:
Ilya Zakharevich
Date:
February 13, 2000 12:15
Subject:
Re: Unicode character composition
Message ID:
20000213151307.A22703@math.mps.ohio-state.edu
On Sun, Feb 13, 2000 at 09:54:43PM +0200, Jarkko Hietaniemi wrote:
> Food for thought: should Perl always make its utf8 data to be in the
> decomposed form to be canonical?  Or, the other way, should it always
> try to find the composite form (to be more compact)? 

No and no.

> A canonical form would make searching the data rather easier.

I think there is a Consortium's document on "Levels of
internationization support in REx engines".  I think there are 3 or 4
levels, and we are on the first one now.  IIRC, what you propose is
similar to the level 2.

I would think that such things should be treated by pessimizers for
RExen: "mutate this REx to support composition/decomposition too".

Ilya

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About