discussion, we’ll assume email addresses start with a letter and end with a letter).
Think of the situations you have to consider:
const
inputs
=
[
,
// nothing but the email
"[email protected] is my email"
,
// email at the beginning
"my email is [email protected]"
,
// email at the end
"use [email protected], my email"
,
// email in the middle, with comma afterward
"my email:[email protected]."
,
// email surrounded with punctuation
];
It’s a lot to consider, but all of these email addresses have one thing in common: they
exist at word boundaries. The other advantage of word boundary markers is that,
because they don’t consume input, we don’t need to worry about “putting them back”
in the replacement string:
const
emailMatcher
=
/\b[a-z][a-z0-9._-]*@[a-z][a-z0-9_-]+\.[a-z]+(?:\.[a-z]+)?\b/ig
;
inputs
.
map
(
s
=>
s
.
replace
(
emailMatcher
,
'<a href="mailto:$&">$&</a>'
));
// returns [
// "<a href="mailto:[email protected]">[email protected]</a>",
// "<a href="mailto:[email protected]">[email protected]</a> is my email",
// "my email is <a href="mailto:[email protected]">[email protected]</a>",
// "use <a href="mailto:[email protected]">[email protected]</a>, my email",
// "my email:<a href="mailto:[email protected]>[email protected]</a>.",
// ]
In addition to using word boundary markers, this regex is using a lot of the features
we’ve covered in this chapter: it may seem daunting at first glance, but if you take the
time to work through it, you’re well on your way to regex mastery (note especially
that the replacement macro,
$&
, does not include the characters surrounding the
email address…because they were not consumed).
Word boundaries are also handy when you’re trying to search for text that begins
with, ends with, or contains another word. For example,
/\bcount/
will find count
and countdown, but not discount, recount, or accountable.
/\bcount\B/
will only find
countdown,
/\Bcount\b/
will find discount and recount, and
/\Bcount\B/
will only
find accountable.
Lookaheads
If greedy versus lazy matching is what separates the dilettantes from the pros, look‐
aheads are what separate the pros from the gurus. Lookaheads—like anchor and word
boundary metacharacters—don’t consume input. Unlike anchors and word bound‐
aries, however, they are general purpose: you can match any subexpression without
consuming it. As with word boundary metacharacters, the fact that lookaheads don’t
match can save you from having to “put things back” in a replacement. While that can
254 | Chapter 17: Regular Expressions