develooper Front page | perl.perl6.language | Postings from May 2010

Re: Perl6 and "accents"

Thread Previous | Thread Next
From:
Helmut Wollmersdorfer
Date:
May 18, 2010 01:51
Subject:
Re: Perl6 and "accents"
Message ID:
4BF254EF.6050909@wollmersdorfer.at
Tom Christiansen wrote:

> Certainly it's perfectly well known amongst people who deal with
> letters--including with the Unicode standard.

>> "Accent" does have a colloquial meaning that maps correctly,
>> but sadly that colloquial definition does not correspond to
>> the technical definition, so in being clear, you become less
>> accurate. There is, as far as I'm aware, no good middle
>> ground, here.

> One doesn't *have* to make up play-words.  There's nothing wrong with the
> correct terminology.  Calling a mark a mark is pretty darned simple.

Well, scientist are not always happy with Unicode terms, e.g. 
'ideograph' for Han characters, or 'Latin' for Roman scripts. But the 
terms should be used as defined by the standard--as names/identifiers of 
properties.

> Unicode has blocks for diacritic marks, and a Diacritic property for
> testing whether something is one.  There are 1328 code points whose
> canonical decompositions have both both \p{Diacritic} and \pM in them,
> 946 code points that have only \pM but not \p{Diacritic}, and 197 that 
> have \p{Diacritic} but not \pM.

If someone really uses Unicode there is way no around deep knowledge of 
the properties. Such code will use Unicode properties directly, and Perl 
6 should therefore support all the properties.

> I still think resorting to talking about "accent marks" is a bad idea.  
> I had somebody the other day thinking that "throwing out the accent marks"
> meant deleting all characters whose code points were over 0x7F--and this
> was a recent CompSci major, too.

I know this sort of people. They also believe that UTF-8 is a 2-byte 
encoding.

> But that's nothing.  The more you look into it, the weirder it can get,
> especially with collation and canonical equivalence, both of which really
> require locale knowledge outside the charset itself.

Sure. The specs of Perl 6 still need huge work on the Unicode part.

Helmut Wollmersdorfer

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About