develooper Front page | perl.perl5.porters | Postings from April 2010

RFC: Proposal that might break some backward compatibility

Thread Previous | Thread Next
karl williamson
April 14, 2010 20:24
RFC: Proposal that might break some backward compatibility
Message ID:
This concerns my proposal to add the construct \c{...}.  This would 
break any existing code that currently uses "\c{" to mean ";" 
(semi-colon).  I would hope that no one does that outside of an 
obfuscated code contest, but I want to be sure that everyone here agrees 
with me.

I posted earlier on this, and I said that "\c{" is not documented. 
Since then, I have found that it arguably is, and hence this post.  In 
the pods about re's, it isn't documented, but in perlop it is:

  The character following "\c" is mapped to some other character by
  converting letters to upper case and then (on ASCII systems) by
  inverting the 7th bit (0x40). The most interesting range is from ’@’ to
  ’_’ (0x40 through 0x5F), resulting in a control character from 0x00
  through 0x1F. A ’?’ maps to the DEL character. On EBCDIC systems only
  ’@’, the letters, ’[’, ’\’, ’]’, ’^’, ’_’ and ’?’ will work, resulting
  in 0x00 through 0x1F and 0x7F.

(Note that there is an error in this statement, in that "\c\\" maps to 
two characters: 0x1c followed by a '\', so the construct cannot be used 
to cleanly generate a 0x1c, which is a FILE SEPARATOR.)

But the statement indicates that it is permissible to use a "\c{" on 
ASCII platforms, without specifying what that might mean; and it turns 
out that that is a semi-colon.

My hope is that people will say that in spite of perlop, it is ok in 
5.14 to break code that uses "\c{".  But here is your chance to say no.

The proposal is currently to use this to specify control characters in a 
platform independent and mnemonic way, so that, for example, \c{ACK} 
would mean the ACKNOWLEDGE control character in both ASCII and EBCDIC. 
(It could be extended to accept other things as well, but H. Merijn's 
comments have convinced me that I need to think about that some more.)

The cost is breaking code that uses "\c{" to mean the semi-colon, and 
extra code in the core that I would write.

The advantages are a clearer platform-independent way to specify control 
characters, and a clean mnemonic way to get the FS character. 
Currently, the only way to get platform-independence is to go to utf8 by 
using the full character names in charnames.

If it isn't ok to do this in 5.14, is it ok to add a deprecation message 
in 5.14, and go to it in 5.16?

Should the other characters whose \c mappings aren't to controls get a 
deprecation message?

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About