develooper Front page | perl.perlfaq.workers | Postings from January 2005

perlfaq6: Why don't word-boundary searches with C<\b> work for me?

Thread Next
From:
_brian_d_foy
Date:
January 31, 2005 09:02
Subject:
perlfaq6: Why don't word-boundary searches with C<\b> work for me?
Message ID:
310120051101564688%comdog@panix.com

* I completely rewrote this answer, so I'm not giving you a diff.
It's easier just to read it straight.  As of today and until I
commit a change, the current answer is
   http://faq.perl.org/perlfaq6.html#Why_don_t_word_bound

* I discovered a slight error in perlre:  it defines \b in
terms of \w and \W, but that isn't strictly true since
the start and end of strings can stand in for non-word
characters.  perlre needs a slight patch before I can 
reference it.

* In the old answer, the examples weren't very interesting
or concrete, so I added more examples and show several
strings that match or don't match a pattern.



=head2 Why don't word-boundary searches with C<\b> work for me?

(contributed by brian d foy)

Ensure that you know what \b really does: it's the boundary between a
word character, \w, and something that isn't a word character. That
thing that isn't a word character might be \W, but it can also be the
start or end of the string.

It's not (not!) the boundary between whitespace and non-whitespace,
and it's not the stuff between words we use to create sentences.

In regex speak, a word boundary (\b) is a "zero width assertion",
meaning that it doesn't represent a character in the string, but a
condition at a certain position.

For the regular expression, /\bPerl\b/, there has to be a word
boundary before the "P" and after the "l".  As long as something other
than a word character precedes the "P" and succeeds the "l", the
pattern will match. These strings match /\bPerl\b/.

   "Perl"    # no word char before P or after l
   "Perl "   # same as previous (space is not a word char)
   "'Perl'"  # the ' char is not a word char
   "Perl's"  # no word char before P, non-word char after "l"

These strings do not match /\bPerl\b/.

   "Perl_"   # _ is a word char!
   "Perler"  # no word char before P, but one after l
   
You don't have to use \b to match words though.  You can look for
non-word characters surrrounded by word characters.  These strings
match the pattern /\b'\b/.

   "don't"   # the ' char is surrounded by "n" and "t"
   "qep'a'"  # the ' char is surrounded by "p" and "a"
   
These strings do not match /\b'\b/.

   "foo'"    # there is no word char after non-word '
   
You can also use the complement of \b, \B, to specify that there
should not be a word boundary.

In the pattern /\Bam\B/, there must be a word character before the "a"
and after the "m". These patterns match /\Bam\B/:

   "llama"   # "am" surrounded by word chars
   "Samuel"  # same
   
These strings do not match /\Bam\B/

   "Sam"      # no word boundary before "a", but one after "m"
   "I am Sam" # "am" surrounded by non-word chars

-- 
brian d foy, comdog@panix.com

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About