LEARNING JAVASCRIPT - Trang 264

Input Consumption

A naïve way to think about regexes is “a way to find a substring within a larger string”

(often called, colorfully, “needle in a haystack”). While this naïve conception is often

all you need, it will limit your ability to understand the true nature of regexes and

leverage them for more powerful tasks.
The sophisticated way to think about a regex is a pattern for consuming input strings.

The matches (what you’re looking for) become a byproduct of this thinking.
A good way to conceptualize the way regexes work is to think of a common children’s

word game: a grid of letters in which you are supposed to find words. We’ll ignore

diagonal and vertical matches; as a matter of fact, let’s think only of the first line of

this word game:

X J A N L I O N A T U R E J X E E L N P

Humans are very good at this game. We can look at this, and pretty quickly pick out

LION, NATURE, and EEL (and ION while we’re at it). Computers—and regexes—are

not as clever. Let’s look at this word game as a regex would; not only will we see how

regexes work, but we will also see some of the limitations that we need to be aware of.
To simplify things, let’s tell the regex that we’re looking for LION, ION, NATURE,

and EEL; in other words, we’ll give it the answers and see if it can verify them.
The regex starts at the first character, X. It notes that none of the words it’s looking for

start with the letter X, so it says “no match.” Instead of just giving up, though, it

moves on to the next character, J. It finds the same situation with J, and then moves

on to A. As we move along, we consider the letters the regex engine is moving past as

being consumed. Things don’t get interesting until we hit the L. The regex engine then

says, “Ah, this could be LION!” Because this could be a potential match, it doesn’t con‐

sume the L; this is an important point to understand. The regex goes along, matching

the I, then the O, then the N. Now it recognizes a match; success! Now that it has

recognized a match it can then consume the whole word, so L, I, O, and N are now

consumed. Here’s where things get interesting. LION and NATURE overlap. As

humans, we are untroubled by this. But the regex is very serious about not looking at

things it’s already consumed. So it doesn’t “go back” to try to find matches in things

it’s already consumed. So the regex won’t find NATURE because the N has already

been consumed; all it will find is ATURE, which is not one of the words it is looking

for. It will, however, eventually find EEL.
Now let’s go back to the example and change the O in LION to an X. What will hap‐

pen then? When the regex gets to the L, it will again recognize a potential match

(LION), and therefore not consume the L. It will move on to the I without consuming

it. Then it will get to the X; at this point, it realizes that there’s no match: it’s not look‐

240 | Chapter 17: Regular Expressions

Liên Kết Chia Sẽ

** Đây là liên kết chia sẻ bới cộng đồng người dùng, chúng tôi không chịu trách nhiệm gì về nội dung của các thông tin này. Nếu có liên kết nào không phù hợp xin hãy báo cho admin.