develooper Front page | perl.perl6.language | Postings from February 2010

going beyond Unicode

From:
Darren Duncan
Date:
February 13, 2010 02:14
Subject:
going beyond Unicode
Message ID:
4B767B3F.503@darrenduncan.net
I have some food for thought, which I'm no

Perl 6 is defined in terms of Unicode in many respects, but in other respects it 
also seems to be agnostic to the character repertoire or encoding or abstraction 
level or whatever, where the latter is explicitly encoded as meta-data in Str 
values for example, allowing multiple representations of text in some manners of 
speaking.

Now, I believe that if Perl 6 truly wants to be all-encompassing, such that any 
other language can be expressed as a grammar of Perl 6, or that otherwise Perl 6 
should be more flexible to handle anyone's text needs, it can't be restricted by 
Unicode.

You see, as is known or documented in many places, numerous aspects of Unicode 
are controversial, such that while it does a lot, there are a lot of needs it 
doesn't address, and there are various complexities.

For example, there was the controversy of Han unification, where glyphs with 
very similar appearance from multiple cultures were treated as being the same 
characters in Unicode, while many Asian people want to treat them as distinct.

This behavior is also in contrast to how for some scripts Unicode provides 
various redundant codepoints for the same glyphs.

And there are other examples of characters from various scripts which are 
missing or mis-organized in Unicode, according to some users of those scripts.

One consequence of this, is that other character repertoires have been created 
or have not been dropped, and are in use alternatively to Unicode, such as by 
some cultures opposed to the Han unification, so that they can properly express 
what they mean to say.

I propose that Perl 6 extend its existing support for multiple character 
abstractions, with its meta-tagged code strings, so that character repertoires 
that extend outside of Unicode are also supported.

Examples are Mojikyo, TRON, GB18030, and several others.

See http://www.jbrowse.com/text/unij.html for some information on the matter.

See also http://www.ruby-forum.com/topic/165927 where related matters were 
discussed for Ruby, but I found that after I wrote this message.

*The idea here is that by being more flexible in what is supported, it is easier 
for Perl 6 users to express what they actually want to say in their code or in 
data processed by it.*

A corollary to there being allowed alternatives to Unicode, is that if we wanted 
to the Perl 6 spec could possibly be split more with some aspects being 
considered more core and some less so, and the support for vast or complicated 
character sets like Unicode in general could be made more optional.  The idea 
here being that it is possible to make more well-defined what a more minimal 
Perl 6 may consist of.  I suggest plain ASCII be the minimum and everything more 
is optional.  And pluggable.

Making the big complicated charsets optional and pluggable is good, I think.

-- Darren Duncan



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About