develooper Front page | perl.perl5.porters | Postings from February 2015

Re: RFC: /w pattern modifier

Thread Previous | Thread Next
Karl Williamson
February 8, 2015 17:11
Re: RFC: /w pattern modifier
Message ID:
On 02/08/2015 09:01 AM, Ed Avis wrote:
> Karl Williamson <public <at>> writes:
>> Straight \b is true at boundaries between \w and \W characters.  I'm
>> told that Perl newbies tend to think of \b as being more like a \s,\S
>> boundary.  I considered implementing this (as it's almost trivial to
>> do), (\b{space} could mean that), but in thinking about it, it appears
>> to me that what they really want is \b{wb} which gives better results
>> for natural languages.
> FWIW, I occasionally use \b when matching English text, but more commonly
> for manipulating C code, Perl code and other machine-readable things.
> Changing all occurrences of foo to bar is usually s/\bfoo\b/bar/g.
>> For example, it should make "don't" a word
> I think this would break uses like the above if code has foo'hello';
> without a space.  But I see that you are not proposing to change the
> default, rather:
>> It has now occurred to me that a lot of existing \b uses really would
>> work better if they were \b{wb}.  And that can be accomplished without
>> having to change every occurrence, by instead having a pattern modifier
>> flag, which could be in a 'use re "/w"' which says treat plain \b as
>> \b{wb} in its scope.
> I expect that would be useful although the name suggests that a /w
> modifier on the regexp itself would be equivalent?  Are you proposing that?


>> I don't see any real use for pretending that \b is any of the other
>> break types, so I think this is the only modifier affecting \b that
>> would ever make sense.
> If you add the new option there should be an explicit 'use re "b-classic"'
> (or whatever you want to call it) which indicates that the programmer has
> thought about it and on reflection prefers the ordinary semantics of \b.
> Then perlcritic etc. will be able to prompt people to make a choice.

One should be able to turn it off with (?-w:...) just like we do for the 
other binary modifiers, and also no re '/w'
> I think that \b is one of the underused parts of the regexp engine and
> could be made more powerful.  For example, I'd like to write a substitution
> that ignores whitespace on matching, but magically inserts it into the
> output string at the 'same' places - as determined by word breaks in the
> input.  Yes, and a pony as well, sorry just thought I would mention it.

Good luck

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About