Front page | perl.perlfaq.workers |
Postings from January 2005
perlfaq6: Why don't word-boundary searches with C<\b> work for me?
Thread Next
From:
_brian_d_foy
Date:
January 31, 2005 09:02
Subject:
perlfaq6: Why don't word-boundary searches with C<\b> work for me?
Message ID:
310120051101564688%comdog@panix.com
* I completely rewrote this answer, so I'm not giving you a diff.
It's easier just to read it straight. As of today and until I
commit a change, the current answer is
http://faq.perl.org/perlfaq6.html#Why_don_t_word_bound
* I discovered a slight error in perlre: it defines \b in
terms of \w and \W, but that isn't strictly true since
the start and end of strings can stand in for non-word
characters. perlre needs a slight patch before I can
reference it.
* In the old answer, the examples weren't very interesting
or concrete, so I added more examples and show several
strings that match or don't match a pattern.
=head2 Why don't word-boundary searches with C<\b> work for me?
(contributed by brian d foy)
Ensure that you know what \b really does: it's the boundary between a
word character, \w, and something that isn't a word character. That
thing that isn't a word character might be \W, but it can also be the
start or end of the string.
It's not (not!) the boundary between whitespace and non-whitespace,
and it's not the stuff between words we use to create sentences.
In regex speak, a word boundary (\b) is a "zero width assertion",
meaning that it doesn't represent a character in the string, but a
condition at a certain position.
For the regular expression, /\bPerl\b/, there has to be a word
boundary before the "P" and after the "l". As long as something other
than a word character precedes the "P" and succeeds the "l", the
pattern will match. These strings match /\bPerl\b/.
"Perl" # no word char before P or after l
"Perl " # same as previous (space is not a word char)
"'Perl'" # the ' char is not a word char
"Perl's" # no word char before P, non-word char after "l"
These strings do not match /\bPerl\b/.
"Perl_" # _ is a word char!
"Perler" # no word char before P, but one after l
You don't have to use \b to match words though. You can look for
non-word characters surrrounded by word characters. These strings
match the pattern /\b'\b/.
"don't" # the ' char is surrounded by "n" and "t"
"qep'a'" # the ' char is surrounded by "p" and "a"
These strings do not match /\b'\b/.
"foo'" # there is no word char after non-word '
You can also use the complement of \b, \B, to specify that there
should not be a word boundary.
In the pattern /\Bam\B/, there must be a word character before the "a"
and after the "m". These patterns match /\Bam\B/:
"llama" # "am" surrounded by word chars
"Samuel" # same
These strings do not match /\Bam\B/
"Sam" # no word boundary before "a", but one after "m"
"I am Sam" # "am" surrounded by non-word chars
--
brian d foy, comdog@panix.com
Thread Next
-
perlfaq6: Why don't word-boundary searches with C<\b> work for me?
by _brian_d_foy