Difference between \w and \b regular expression meta characters

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.

In all flavors, the characters [a-zA-Z0-9_] are word characters. These are also matched by the short-hand character class \w. Flavors showing “ascii” for word boundaries in the flavor comparison recognize only these as word characters.

\w stands for “word character”, usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

\B is the negated version of \b\B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

\W is short for [^\w], the negated version of \w.

Leave a Comment