develooper Front page | perl.perl5.porters | Postings from May 2010

Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w

Thread Previous | Thread Next
From:
karl williamson
Date:
May 19, 2010 13:41
Subject:
Re: PATCH: [perl #58182] partial, "The Unicode Bug". Add unicodesemantics for \s, \w
Message ID:
4BF44CEB.2030801@khwilliamson.com
demerphq wrote:
> On 19 May 2010 21:46, karl williamson <public@khwilliamson.com> wrote:
>> Eric Brine wrote:
>>> On Wed, May 19, 2010 at 2:20 PM, karl williamson <public@khwilliamson.com
>>> <mailto:public@khwilliamson.com>> wrote:
>>>
>>>    I don't understand these preclusions.  Why, for example, does the
>>>    existence of 'and' preclude /a, but the existence of 'unless' not
>>>    preclude /u ?
>>>
>>> When used as a statement modifier.
>>>
>>> $ perl -le'$x+=/foo/unless$c; print "ok"'
>>> ok
>>>
>>> If we add /u, the above would die as follows:
>>>
>>> Bareword found where operator expected at -e line 1, near "/foo/unless"
>>>        (Missing operator before nless?)
>>> syntax error at -e line 1, near "/foo/unless"
>>> Execution of -e aborted due to compilation errors.
>>>
>>> "Preclude" is not quite the right word, at least not on its own. They
>>> preclude the addition of the modifier without some form of conflict
>>> resolution. Most of the conflicts can even be resolved cleanly by lookahead.
>>> (/l isn't resolved cleanly by lookahead.)
>>>
>>>    Instead of creating a syntax error, we deprecate in 5.14 not
>>>    inserting a space between a pattern terminator and the following word.
>>>
>>>
>>> That still breaks backwards compatibility, and we'd have to wait for 5.016
>>> to get /u and /l in.
>>>
>>> "use 5.014;" avoids both the break and the wait. It could either add /u
>>> and /l, or it could add the space requirement.
>>>
>> I don't think you understood my suggestion; everything would take effect in
>> 5.14.  What I meant is that we could resolve things like the "unless' by
>> lookahead.  That is we special case the l and u (and /a if we get it)
>> modifiers so that they don't take effect if the word they're in is a legal
>> one; the complete list of which you've given (I think).
>>
>> The algorithm would be: the code would look first for the 5.12 modifier set,
>> as currently.  If that exhausts the word, continue as currently. Otherwise
>> if the word is one of the few you've mentioned, also continue as currently,
>> but raise the deprecated warning.  Otherwise, reparse the word, this time
>> allowing the new modifiers.  If that exhausts the word, fine, we've got our
>> modifiers.  If not, raise the deprecated warning.  A syntax error would also
>> be generated if the word isn't recognized.
>>
>> This would guarantee backward compatibility, with no inappropriate syntax
>> errors.  /lt and /le would be resolved by documenting that these have the
>> 5.12 meanings.  This would be lifted in 5.16 after the deprecation cycle.
>>
>> I think the reasons to prefer my solution over yours is that it doesn't
>> require a 'use 5.014'; which I always forget to include, and I prefer doing
>> deprecation instead of syntax errors.  (My first take is that modifying the
>> 'use 5.014' solution to do deprecation would be very similar to taking my
>> suggestion.)
>>
>> Otherwise, I'm fine with yours, except it requires me to learn a new area of
>> Perl in order to make the patch. :)
>> Does anyone else have an opinion?
> 
> I kinda like your plan, with the exception that its going to be crufty
> code until we are past the deprecation cycle.

I've actually looked at the code.  More simply stated, aside from doing 
the deprecation, do everything eactly as currently still using the 5.12 
modifier set. But when you get to the part where you would otherwise 
throw a syntax error, instead expand the modifier set to include the new 
ones and try again.  That's all, not very much code.
> 
> One thing tho, it occurred to me that in this discussion we have
> omitted to mention one subtle point.
> 
> We don't have to support new modifiers as trailing modifiers /at all/
> if we don't want to, as we can always make them restricted to the
> (?msix:...) form.
> 
> This means that you could have access to the syntax without the 'use
> 5.014' just not as a trailing modifier.

But I still find (?msix:) ugly.  People I know think perl is 
"write-only" because they haven't gotten used to all the special 
characters, which I have grown accustomed to over the years.  But still 
not that construct for me.  But as a temporary 5.14 measure, I could 
accept that.
> 
> One other thing.... these new flags are special in that they are
> essential mutually exclusive. Maybe we SHOULD make them capitalized to
> emphasize this fact.
> 
> A simple rule like "you may only use one capitalized modifier at a
> time" is a pretty easy to remember as compared to "the modfiers /l /a
> /u and /r are all mutually exclusive" and with capital modifiers we
> dont have any back compat problems.
> 
> Also, i think there is precedent in one of the other languages for a
> /U modifier, if ours does the same thing, all the better.

I'm neutral on this.  But I have been thinking lately that maybe the /a 
modifier wouldn't be mutually exclusive.  I can see someone wanting to 
restrict \d and \w to ASCII while still wanting \U or \T behavior 
otherwise; less likely with \L.
> 
> Cheers,
> yves
> 
> 
> 
> 
> 
> 
> 
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About