develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Any opposition still to the idea of syntax indicating defaultregex modifiers?

Thread Previous | Thread Next
karl williamson
August 22, 2010 10:38
Re: Any opposition still to the idea of syntax indicating defaultregex modifiers?
Message ID:
Ben Morrow wrote:
> Quoth (karl williamson):
>> Ben Morrow wrote:
> [This paragraph only quoted to show why I've switched from (?.:) to (?^:)
> in examples]
>> More than one person has expressed a preference for ^ over dot, so I'm 
>> thinking that will be what it ends up being.
> Presumably we should also add (?^xs), no colon, analogous to (?xs).


>>> I've mentioned in passing before, and I think I should say again, that
>>> having a /d switch at all is rather confusing. IMHO it would be much
>>> better (if we're going to restrict ourselves to one-letter switches with
>>> no arguments) to just have /u and /l, with the old behaviour implied by
>>> neither of those being present. The new (?.:) syntax would mean that the
>>> one place where /d might potentially be used, (?d-xims:foo), is better
>>> written (?.:foo).
>> I mostly agree, except if that is the one flag that is being switched, 
>> you don't want to have to specify all of them.
> I dont understand what you mean here. You can change from /u to
> compatible matching within a group without affecting other flags with
> (?-u:...). 

You're right.

>> I don't understand why 
>> the /d is confusing.  These operate like radio buttons; my 12 year old 
>> car still has them.  don't know about newer cars.  The radio is always 
>> tuned to a station.  You press the button of the station you want.
> One reason it's confusing is because under normal circumstances /foo/d
> is exactly the same as /foo/. 

OK, but it is also being proposed to have a way to set default 
modifiers.  When (and if) that happens, you will be able to say that you 
want all matches to be case ignorable, and so /foo/i will be exactly the 
same thing as /foo/.  So it will become normal to have a modifier 
specified that doesn't change behavior; I don't think this is a real 

Another is because all the other switches
> are booleans, with (?x) to turn them on and (?-x) to turn them off. None
> of /lud would be valid in (?-d) under your scheme, and the docs would
> have to attempt to explain why.

That's true, and we already have the same behavior with /p which can't 
appear as a minus, so I don't think it's a real problem.
> If we're going to insist on squashing a 3-way choice into a set of
> boolean flags, it seems easier to me to model it as '2 flags which can't
> be on at the same time' than as '3 flags which can't be on at the same
> time, and one of which is on by default'.
>>> If you're going to introduce pragmata to turn some switches on for some
>>> lexical scopes, the question of 'so how do they get turned off again?'
>>> needs to be addressed in a more systematic way than just introducing a
>>> single special-case switch that turns some other switches off. I would
>>> be happy with either of
>>>     use unicode_strings;
>>>     /foo/U;
>>> or
>>>     use unicode_strings;
>>>     /foo/-u;

This is not easily solvable.  u can legitimately be a function call.
>>> though the latter obviously has more back-compat considerations. I would
>>> also be happy, for now, with requiring people write
>>>     use unicode_strings;
>>>     /(?-u)foo/;

>>> if they want to go back to the old semantics.
>> I would find this confusing.  Either of your examples says what the 
>> person  doesn't want.  It doesn't say which of the other (currently two) 
>> choices they do want.
> They want the default, obviously. The behaviour they would have got
> before they asked for /u to be added implicitly.
>> You can't unpress a radio button to go to another 
>> station; you must press the button you want.  As more possible things 
>> like /a got added, it would get worse.
> No it wouldn't. Turning off a switch that is currently on goes back to
> the default; turning off a switch that is currently off does nothing.
> The weird case is turning on a switch that is currently off but is
> mutually-exclusive with one that is currently on: that will (probably)
> implicitly turn something off.

I think it's best to address this whole issue more than piecemeal 
responding to your statements.

I don't like the idea of ever having s/foo/bar/-u mean "to do the 
substitution using the traditional character set semantics".  As I 
implied above, this construct could quite legitimately mean subtracting 
the result of evaluating the function u from the number of substitutions 
made.  Therefore, I don't think we should ever deprecate and remove this 
meaning, and doing the disambiguation would be hard; Jesse has already 
vetoed my much easier disambiguation.  And, I think suffix modifiers 
should be added in 5.16, and therefore we have to have a way of 
specifying the default in a suffix modifier without using a minus.  One 
way to do that is to extend the paradigm that capitals are the opposite 
of lowercase, as you also suggest.  That's not compatible with other 
proposals for capitals, but it is doable.  But read on for my other 

The paradigm of turning one thing on having the effect of turning 
something else off is a well-accepted computer concept.  In everyday 
life, car radios have long had this ability, which is why the gui widget 
is often called a radio button.  Other names are option button, or push 
button.  I have written many gui forms that use them, even binary fields 
often use them.  "Select Male or Female".  Selecting one de-selects the 
other.  "Do you want standard, fast, faster, or fastest shipping?" 
These are often implemented with radio buttons, where selecting one 
de-selects any other you might have selected previously.

Psychologically, there is less cognitive load in saying what you want 
instead of what you don't want.  When I play a game with kids where you 
try to guess the word someone is thinking of, I often use the word 
"shadow" because it is from the absence of something, and therefore is 
not easily grasped.  I consider this a big deal issue, and the 
bottom-line reason for not using your paradigm.  I think it is poor user 
interface design.

Under your scheme /foo/L and /foo/U both would mean the same thing, I 
presume.  That seems weird to me, and certainly more confusing than have 
a modifier restate the default.
>>> While we're discussing this: would you be at all open to the idea of
>>> supplying a /a ('ASCII') switch that makes \d\s\w match what everyone
>>> thinks they do (that is, [0-9], [ \t\n] and [a-zA-Z0-9_] respectively)?
>> I think that if we're going to discuss this now, it should go on another 
>> thread.  It has been discussed before, and is a possibility, but I think 
>> it's part of a larger problem, and I think would be better served by a 
>> more general 'use script pragma' that allows ASCII as a special 
>> sub-script of Latin.
> Hmm. OK. I see this as more 'giving me back a feature 5.8 stupidly
> removed' than as a new feature, and thus a little more urgent than
> support for random script-specfic matching, but never mind.

If there's time, Yves or me will try to get this into 5.14.  But we 
really need to come up with a plan that considers the whole issue before 
possibly painting ourselves into a corner.
> Ben

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About