matches everything that’s whitespace…and everything that’s not whitespace. In short,
everything.
Grouping
So far, the constructs we’ve learned about allow us to identify single characters (repe‐
tition allows us to repeat that character match, but it’s still a single-character match).
Grouping allows us to construct subexpressions, which can then be treated like a single
unit.
In addition to being able to create subexpressions, grouping can also “capture” the
results of the groups so you can use them later. This is the default, but there is a way
to create a “noncapturing group,” which is how we’re going to start. If you have some
regex experience already, this may be new to you, but I encourage you to use noncap‐
turing groups by default; they have performance advantages, and if you don’t need to
use the group results later, you should be using noncapturing groups. Groups are
specified by parentheses, and noncapturing groups look like
(?:<subexpression>)
,
where
<subexpression>
is what you’re trying to match. Let’s look at some examples.
Imagine you’re trying to match domain names, but only .com, .org, and .edu:
const
text
=
"Visit oreilly.com today!"
;
const
match
=
text
.
match
(
/[a-z]+(?:\.com|\.org|\.edu)/i
);
Another advantage of groups is that you can apply repetition to them. Normally, rep‐
etition applies only to the single character to the left of the repetition metacharacter.
Groups allow you to apply repetition to whole strings. Here’s a common example. If
you want to match URLs, and you want to include URLs that start with http://,
https://, and simply // (protocol-independent URLs), you can use a group with a zero-
or-one (
?
) repetition:
const
html
=
'<link rel="stylesheet" href="http://insecure.com/stuff.css">\n'
+
'<link rel="stylesheet" href="https://secure.com/securestuff.css">\n'
+
'<link rel="stylesheet" href="//anything.com/flexible.css">'
;
const
matches
=
html
.
match
(
/(?:https?)?\/\/[a-z][a-z0-9-]+[a-z0-9]+/ig
);
Look like alphabet soup to you? It does to me too. But there’s a lot of power packed
into this example, and it’s worth your while to slow down and really consider it. We
start off with a noncapturing group:
(?:https?)?
. Note there are two zero-or-one
repetition metacharacters here. The first one says “the s is optional.” Remember that
repetition characters normally refer only to the character to their immediate left. The
second one refers to the whole group to its left. So taken all together, this will match
the empty string (zero instances of
https?
),
http
, or
https
. Moving on, we match
two slashes (note we have to escape them:
\/\/
). Then we get a rather complicated
character class. Obviously domain names can have letters and numbers in them, but
Grouping | 247