LEARNING JAVASCRIPT - Trang 272

they can also have dashes (but they have to start with a letter, and they can’t end with

a dash).
This example isn’t perfect. For example, it would match the URL //gotcha (no TLD)

just as it would match //valid.com. However, to match completely valid URLs is a

much more complicated task, and not necessary for this example.

If you’re feeling a little fed up with all the caveats (“this will match

invalid URLs”), remember that you don’t have to do everything all

the time, all at once. As a matter of fact, I use a very similar regex to

the previous one all the time when scanning websites. I just want to

pull out all the URLs—or suspect URLs—and then do a second

analysis pass to look for invalid URLs, broken URLs, and so on.

Don’t get too caught up in making perfect regexes that cover every

case imaginable. Not only is that sometimes impossible, but it is

often unnecessary effort when it is possible. Obviously, there is a

time and place to consider all the possibilities—for example, when

you are screening user input to prevent injection attacks. In this

case, you will want to take the extra care and make your regex iron‐

clad.

Lazy Matches, Greedy Matches

What separates the regex dilettantes from the pros is understanding lazy versus

greedy matching. Regular expressions, by default, are greedy, meaning they will match

as much as possible before stopping. Consider this classic example.
You have some HTML, and you want to replace, for example,

<i>

text with

<strong>

text. Here’s our first attempt:

const

input

=

"Regex pros know the difference between\n"

+

"<i>greedy</i> and <i>lazy</i> matching."

;

input

.

replace

(

/<i>(.*)<\/i>/ig

,

'<strong>$1</strong>'

);

The

$1

in the replacement string will be replaced by the contents of the group

(.*)

in

the regex (more on this later).
Go ahead and try it. You’ll find the following disappointing result:

"Regex pros know the difference between
<strong>greedy</i> and <i>lazy</strong> matching."

To understand what’s going on here, think back to how the regex engine works: it

consumes input until it satisfies the match before moving on. By default, it does so in

a greedy fashion: it finds the first

<i>

and then says, “I’m not going to stop until I see

an

</i>

and I can’t

find any more past that.” Because there are two instances of

</i>

, it

ends at the second one, not the first.

248 | Chapter 17: Regular Expressions

Liên Kết Chia Sẽ

** Đây là liên kết chia sẻ bới cộng đồng người dùng, chúng tôi không chịu trách nhiệm gì về nội dung của các thông tin này. Nếu có liên kết nào không phù hợp xin hãy báo cho admin.