Question marks in regular expressions

The key difference between ? and ?? concerns their laziness. ?? is lazy, ? is not.

Let’s say you want to search for the word “car” in a body of text, but you don’t want to be restricted to just the singular “car”; you also want to match against the plural “cars”.

Here’s an example sentence:

I own three cars.

Now, if I wanted to match the word “car” and I only wanted to get the string “car” in return, I would use the lazy ?? like so:

cars??

This says, “look for the word car or cars; if you find either, return car and nothing more”.

Now, if I wanted to match against the same words (“car” or “cars”) and I wanted to get the whole match in return, I’d use the non-lazy ? like so:

cars?

This says, “look for the word car or cars, and return either car or cars, whatever you find”.

In the world of computer programming, lazy generally means “evaluating only as much as is needed”. So the lazy ?? only returns as much as is needed to make a match; since the “s” in “cars” is optional, don’t return it. On the flip side, non-lazy (sometimes called greedy) operations evaluate as much as possible, hence the ? returns all of the match, including the optional “s”.

Personally, I find myself using ? as a way of making other regular expression operators lazy (like the * and + operators) more often than I use it for simple character optionality, but YMMV.

See it in Code

Here’s the above implemented in Clojure as an example:

(re-find #"cars??" "I own three cars.")
;=> "car"

(re-find #"cars?" "I own three cars.")
;=> "cars"

The item re-find is a function that takes its first argument as a regular expression #"cars??" and returns the first match it finds in the second argument "I own three cars."

See it in Code

Related Posts:

Leave a Comment Cancel reply