develooper Front page | perl.perl5.porters | Postings from April 2010

Re: RFC: Use" \c{...}" for controls and non-ascii character input

Thread Previous
From:
karl williamson
Date:
April 12, 2010 07:45
Subject:
Re: RFC: Use" \c{...}" for controls and non-ascii character input
Message ID:
4BC331DF.3070608@khwilliamson.com
H.Merijn Brand wrote:
> On Fri, 09 Apr 2010 02:18:24 -0600, karl williamson
> <public@khwilliamson.com> wrote:
> 
>> The first part of this proposal would make \c{CAN} be the cancel 
>> character; \c{VT} be the vertical tab; \c{NEL} would be the "next line", 
>> etc.
> 
> Where "c" stands for Control
> 
>> The second part would have \c{a:} be ä, \c{o`} be ò, \c{1/2} be the 
>> fraction one-half, etc.  Mostly these would match the vim editor's 
>> digraphs for entering such characters.
> 
> Where "c" stands for Combining :)
> This is a very interesting proposal!
> 
> \c{:o} would be the same as \c{o:} ?

Sure

> 
>> This is more succinct and faster than using the \N{...} forms, and 
>> clearer and portable in contrast to \o and \x.
> 
> Don't know if it is clearer. I always use charnames with :alias, and I
> find \N{e_DIAERES} and \N{o_SLASH} very clear. The problem with \c{e:}
> might be that a lot of users are used to what their Multinational key
> bindings offer (wrongly) to use the " for diaereses. The " should
> obviously mean the double acute. ó, ő, ö (Acute, Double Acute,
> Diaereses).

There are actually two very separate components of my proposal.  One is 
to use the standard acronyms for the control characters.  I think that 
those should also be added to \N{...}.  The other was to be able to 
represent the Latin1 characters symbolically, without having to type 
long names, and without having to load charnames.  The exact form(s) of 
those could certainly be different than what I suggested, which was 
based on vim
> 
> Here's the list of `special' characters we support at PROCURA for
> combining characters:
> 
> #define CH_AC   '\''    /* ACUTE */
> #define CH_GR   '`'     /* GRAVE */
> #define CH_CI   '^'     /* CIRCUMFLEX */
> #define CH_DI   ':'     /* DIAERESIS */
> #define CH_TI   '~'     /* TILDE */
> #define CH_CA   'v'     /* CARON */
> #define CH_BR1  '('     /* BREVE */
> #define CH_BR2  ')'     /* BREVE */
> #define CH_DB   '"'     /* DOUBLE ACUTE */
> #define CH_RI   'o'     /* RING */
> #define CH_DO   '.'     /* DOT */
> #define CH_MA   '-'     /* MACRON */
> #define CH_CE   ','     /* CEDILLA */
> #define CH_OG   '{'     /* OGONEK */
> #define CH_ST   '_'     /* STROKE */
> #define CH_SL   '/'     /* SLASH,       for many identical to CH_ST */
> 
>> I propose that only the characters in the Latin1 range be encoded, but 
>> it could eventually be extended beyond that.
> 
> ++
> 
> FYI my .XCompose that uses this scheme for the characters we have to
> support for legal purposes (and a few more for fun) has been uploaded
> here: http://www.xs4all.nl/~hmbrand/.XCompose
> 
>> To do this proposal requires changing "\c{" from what it currently 
>> means.  What that is is undocumented, and I believe undefined anywhere. 
>>   On EBCDIC platforms, it generates the fatal error "unrecognised 
>> control character".  On ASCII platforms, because nobody checked for it, 
>> and because of the vagaries of the algorithm used, it silently generates 
>> a semi-colon.
>>
>> I believe that we can consider this sequence as available for 5.14.  If 
>> desired, I could add a test for 5.12.1 that deprecates this (and 
>> probably all the other undefined sequences, such as "\c3" (which yields 
>> the same error in EBCDIC, and silently an "s" in ASCII)).
>>
>> Your reactions?
> 


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About