develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags

Thread Previous | Thread Next
From:
David Golden
Date:
August 3, 2010 17:17
Subject:
Re: RFC: New regex modifier flags
Message ID:
AANLkTi=OUTtYErywVSPRQmskLGVjDUb+1fr=kPUd7tt8@mail.gmail.com
On Tue, Aug 3, 2010 at 3:22 PM, karl williamson <public@khwilliamson.com> wrote:
> This is another go round in what to do about this.  I hope this isn't too
> long.  I give extensive background, and then a place to vote your preference
> at the end.

it's a very clear summary of the situation.  Thank you.

> The problem is what those modifiers should be.  Perl currently allows a
> keyword to come right after a regex, like '/abc/lt 1'  Code is being added
> in 5.14 to deprecate that, but we are stuck with that until at least 5.16.

I would suggest we reconsider whether this is really a "deprecation"
situation.  Is there any documentation that states that such
constructs are legal?  I was surprised to find that they are. I would
have expected it to be a syntax error.  If we instead "fix the bug"
that invalid regex flags are not detected as a syntax error, then we
don't have to consider this a deprecated feature and we don't have to
wait two years for a sane approach.  That may lead us to favor
different options.

>  Thus the original implementation I did a few months ago which used both 'l'
> and 't' might break existing programs.  It was noted then that these are the
> only modifiers that are mutually exclusive. Even /ms are not mutually
> exclusive, but these are: you can have only a single semantic interpretation
> in effect at a time.  Several people said that therefore these new modifiers
> should be distinguished from the regular ones in some way.

I can see the rationale for that argument, but I don't think they need
to be visually distinguished.  It just needs to be a syntax error if
more then one is specified.  That makes the conflict immediately
apparent.  Additionally, in any valid code, there will never be more
than one to be confusing.

Given what I've said above, I prefer:

7) Use single letter lower case option letters.  They are not visibly
distinguished from
options that aren't mutually exclusive, but it will be a syntax error
to apply more than one
mutually-exclusive option.  Rather than deprecating keywords without a
space after
a regular expression, instead any invalid option letter will be a syntax error.

If forced to choose from the original options, I would consider two of
them, in the following order of preference:

> 5) Use single letter lower case option letters, but until 5.16 they are only
> valid in the (?:) form.  They are not visibly distinguished from options
> that aren't mutually exclusive.

The only disadvantage of this is that waiting for 5.16 will sort of
suck.  The huge advantage is that it's keeps things consistent and
doesn't loose any mnemonic benefit.  However, even in (?:) form,
having more than one mutually exclusive options should be a syntax
error.

> 2) Use two-letter options for the mutually exclusive ones.  Extensible,
> visually distinguishable, but /abc/Clip may be hard to read.

However, I would suggest as a design decision that upper case letter
options be reserved for two-character options and that the second
character also be upper-case.  e.g. /abc/CLip.  This visually
distinguishes two-character options from single character options.
That would be a general rule that will save people from having to
always remember which options are dual versus single.  If it's
capitalize, look at them two at a time.  If lower-case, look at them
one at a time.

I think the remaining proposed solutions should not be considered at all.

-- David

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About