develooper Front page | perl.perl6.language | Postings from March 2009

Metacharacters in character classes

From:
=?ISO-8859-1?Q?Carl_M=E4sak?=
Date:
March 26, 2009 07:41
Subject:
Metacharacters in character classes
Message ID:
16d769b70903260741x20448c48v83786235f03dddfc@mail.gmail.com
It started by yours truly asking impertinent questions on #perl6...

 <http://irclog.perlgeek.de/perl6/2009-03-26#i_1018345>

...and ended with a general feeling that the way metacharacters and
backwhacking work in <[ ]> character classes, is at worst inconsistent
and at best underspecified by S05.

Specifically, the following paragraphs from that spec do _not_ hold
for character classes, which are more like a sublanguage of their own:

] Unlike traditional regular expressions, PerlĀ 6 does not require
] you to memorize an arbitrary list of metacharacters.  Instead it
] classifies characters by a simple rule.  All glyphs (graphemes)
] whose base characters are either the underscore (C<_>) or have
] a Unicode classification beginning with 'L' (i.e. letters) or 'N'
] (i.e. numbers) are always literal (i.e. self-matching) in regexes. They
] must be escaped with a C<\> to make them metasyntactic (in which
] case that single alphanumeric character is itself metasyntactic,
] but any immediately following alphanumeric character is not).
]
] All other glyphs--including whitespace--are exactly the opposite:
] they are always considered metasyntactic (i.e. non-self-matching) and
] must be escaped or quoted to make them literal.  As is traditional,
] they may be individually escaped with C<\>, but in PerlĀ 6 they may
] be also quoted as follows.

In character classes, most 'other glyphs' mean themselves, just like
alphanumerics, with a few notable exceptions: backslash (\), closing
bracket (]) and dash (-) and whitespace still need to be backwhacked.
All other characters are treated literally, including dot (.) which is
actually used for metasyntactic purposes in character classes. In
other words, currently /<[.]>/ is legal, but /<[-]>/ is not. Which is
kinda weird, if you ask me.

See the linked #perl6 log for details.

What's the big-picture rule of thumb regarding metacharacters in
character classes?

// Carl



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About