develooper Front page | perl.perl5.porters | Postings from December 2015

Re: Obsolete text in utf8.pm

Thread Previous
From:
John Imrie
Date:
December 26, 2015 10:32
Subject:
Re: Obsolete text in utf8.pm
Message ID:
567E6CC2.9020906@virginmedia.com
On 25/12/2015 03:21, Zefram wrote:
> demerphq wrote:
>>       Is it the one where combining characters are replaced with
>> canonical codepoints?
> Basically, yes.  NFC = Composed: precomposed characters are used
> as much as possible.  NFD = Decomposed: precomposed characters are
> entirely avoided in favour of separate base and combining characters.
> There's also NFKC and NFKD, which perform compatibility mappings and
> then either Compose or Decompose.
>
> -zefram

The form of normalisation wasn't the main point of me post. Just that I
think we should impose normalization
of some form on the symbol table to avoid code written in different
editors imposing different ideas of how
to form a particular character. Take ệ (U+1EC7; LATIN SMALL LETTER E
WITH CIRCUMFLEX AND DOT BELOW ) as a case
in point, a text editor could take that as one code point or break it
down into one of four other combinations of code
points.

John

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About