develooper Front page | perl.perl5.porters | Postings from May 2010

Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w

Thread Previous | Thread Next
From:
Ben Morrow
Date:
May 20, 2010 05:04
Subject:
Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w
Message ID:
20100520120336.GA15369@osiris.mauzo.dyndns.org
Quoth public@khwilliamson.com (karl williamson):
> Curtis Jewell wrote:
> > On Wed, 19 May 2010 22:51 +0100, "Paul LeoNerd Evans"
> > <leonerd@leonerd.org.uk> wrote:
> >> On Tue, May 11, 2010 at 12:54:01PM -0600, karl williamson wrote:
> >>> These commits also add regex modifiers /u (unicode), /l (locale), and /t
> >>> (traditional).  /a is not part of this patch.  I have made up the term
> >>> "Matching mode" to describe this.  I'm open to a better term, if you can
> >>> think of one.
> >> It may perhaps be far too late to reconsider, but I'm not sure I like
> >> these notations. These are three mutually-exclusive settings along one
> >> axis, they are not three independent settings on three different axes,
> >> such as /l vs /g.
> >>
> >> Would it not make more sense to group them up under a single /u flag,
> >> something of the following:
> >>
> >>  m/Unicode on/u
> >>  m/Unicode off/u0
> >>  m/Unicode if locale says/ul
> >>  m/Unicode traditionally/ut
> > 
> > We do have the assumption that capital letters oppose their lowercase
> > counterparts, as far as I can tell, so that the first two would be
> > 
> > m/Unicode on/u
> > m/Unicode off/U
> > 
> > (I'm making the assumption we're adding a /U with that /u.)
> > 
> > The question is, are the other two on an axis where we can say "/l
> > applies only if /u, and /uL would be the equivalent of the proposed /t
> > option?"
> > 
> > (i.e. is locale/traditional a two state, rather than locale/something
> > else/traditional being 3-state?)
> 
> It is tri-state, with each value excluding the other two, and maybe a 
> fourth value will be added to make it quad-state.

Since there aren't any upper-case modifiers yet, it would be possible to
introduce the rule 'Upper-case modifiers take a single-character
argument' at this point. This would give a syntax like

    m/unicode    /Uugx;
    m/locale     /Ulgx;
    m/traditional/Utgx;

which seems at least as clear as three or four random letters that
happen to be mutually-exclusive.

While I like the \s\S symmetry of the regex escapes, that doesn't apply
here (yet), so we don't need to keep it if it's inconvenient (which it
is, since here we are with a three-state option).

Ben


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About