develooper Front page | perl.perl5.porters | Postings from February 2015

Re: RFC: /w pattern modifier

Thread Previous | Thread Next
From:
Ed Avis
Date:
February 8, 2015 16:01
Subject:
Re: RFC: /w pattern modifier
Message ID:
loom.20150208T165344-26@post.gmane.org
Karl Williamson <public <at> khwilliamson.com> writes:

>Straight \b is true at boundaries between \w and \W characters.  I'm 
>told that Perl newbies tend to think of \b as being more like a \s,\S 
>boundary.  I considered implementing this (as it's almost trivial to 
>do), (\b{space} could mean that), but in thinking about it, it appears 
>to me that what they really want is \b{wb} which gives better results 
>for natural languages.

FWIW, I occasionally use \b when matching English text, but more commonly
for manipulating C code, Perl code and other machine-readable things.
Changing all occurrences of foo to bar is usually s/\bfoo\b/bar/g.

>For example, it should make "don't" a word

I think this would break uses like the above if code has foo'hello';
without a space.  But I see that you are not proposing to change the
default, rather:

>It has now occurred to me that a lot of existing \b uses really would 
>work better if they were \b{wb}.  And that can be accomplished without 
>having to change every occurrence, by instead having a pattern modifier 
>flag, which could be in a 'use re "/w"' which says treat plain \b as 
>\b{wb} in its scope.

I expect that would be useful although the name suggests that a /w
modifier on the regexp itself would be equivalent?  Are you proposing that?

>I don't see any real use for pretending that \b is any of the other 
>break types, so I think this is the only modifier affecting \b that 
>would ever make sense.

If you add the new option there should be an explicit 'use re "b-classic"'
(or whatever you want to call it) which indicates that the programmer has
thought about it and on reflection prefers the ordinary semantics of \b.
Then perlcritic etc. will be able to prompt people to make a choice.

I think that \b is one of the underused parts of the regexp engine and
could be made more powerful.  For example, I'd like to write a substitution
that ignores whitespace on matching, but magically inserts it into the
output string at the 'same' places - as determined by word breaks in the
input.  Yes, and a pony as well, sorry just thought I would mention it.

-- 
Ed Avis <eda@waniasset.com>


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About