develooper Front page | perl.perl6.internals | Postings from June 2001

Re: More character matching bits

Thread Previous | Thread Next
From:
Bryan C . Warnock
Date:
June 15, 2001 03:55
Subject:
Re: More character matching bits
Message ID:
01061506523207.05461@wakko.idiocity.nut
On Thursday 14 June 2001 12:01 pm, Dan Sugalski wrote:
> Fancy character classes are probably enough to handle the various casing
> issues and their analogs. They're probably not enough to handle things
> like the arabic tatwheel, or proper word breaks in most asian languages.
> Heck, unless I'm missing something, they're insufficient for something as
> simple as \d.
>
> I'm not advocating forcing dictionaries into the regex engine, nor even
> shipping them with the core. 

That's not to say that some Locale::* couldn't include one, or reference a 
third party one.

> As I see it, locales specify:
>
>    * Collating order
>    * Comparison/equality specification
>    * Unicode codepoint interpretation

What do you mean by that?

>    * Regex character classes
>    * Regex character identification
>    * Regex zero-width assertion rules
>    * 'casing' rules
>
> It'd be nice to specify them all separately and inherit the ones you don't
> need to change from some parent locale.

Or have these individual bits and pieces be addressable through the regexen, 
and have locales *defined* via that.

module Locale::Hawaiian;
use re 'class (\w => [aeiouâêîôûhklmnpw`])';
...

On a side note (and this *will* sound stupid, but there is a reason I'm 
asking).  Why is there no logical opposite to '.'; that is, a character 
which never matches another character?  (Besides, of course, that it's 
utterly useless from a classic regex perspective.)

-- 
Bryan C. Warnock
bwarnock@capita.com

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About