LEARNING JAVASCRIPT - Trang 266

• If there is a match, the regex consumes all the characters in the match at once;

matching continues with the next character (if the regex is global, which we’ll talk

about later).

This is the general algorithm, and it probably won’t surprise you that the details are

much more complicated. In particular, the algorithm can be aborted early if the regex

can determine that there won’t be a match.
As we move through the specifics of the regex metalanguage, try to keep this algo‐

rithm in mind; imagine your strings being consumed from left to right, one character

at a time, until there are matches, at which point whole matches are consumed at

once.

Alternation

Imagine you have an HTML page stored in a string, and you want to find all tags that

can reference an external resource (

<a>

,

<area>

,

<link>

,

<script>

,

<source>

, and

sometimes,

<meta>

). Furthermore, some of the tags may be mixed case (

<Area>

,

<LINKS>

, etc.). Regular expression alternations can be used to solve this problem:

const

html

=

'HTML with <a href="/one">one link</a>, and some JavaScript.'

+

'<script src="stuff.js"></script>'

;

const

matches

=

html

.

match

(

/area|a|link|script|source/ig

);

// first attempt

The vertical bar (

|

) is a regex metacharacter that signals alternation. The

ig

signifies

to ignore case (

i

) and to search globally (

g

). Without the

g

, only the first match would

be returned. This would be read as “find all instances of the text area, a, link, script, or

source, ignoring case.” The astute reader might wonder why we put

area

before

a

; this

is because regexes evaluate alternations from left to right. In other words, if the string

has an

area

tag in it, it would match the

a

and then move on. The

a

is then con‐

sumed, and

rea

would not match anything. So you have to match

area

first, then

a

;

otherwise,

area

will never match.

If you run this example, you’ll find that you have many unintended matches: the

word link (inside the

<a>

tag), and instances of the letter a that are not an HTML tag,

just a regular part of English. One way to solve this would be to change the regex

to

/<area|<a|<link|<script|<source/

(angle brackets are not regex metacharac‐

ters), but we’re going to get even more sophisticated still.

Matching HTML

In the previous example, we perform a very common task with regexes: matching

HTML. Even though this is a common task, I must warn you that, while you can gen‐

erally do useful things with HTML using regexes, you cannot parse HTML with

regexes. Parsing means to completely break something down into its component

242 | Chapter 17: Regular Expressions

Liên Kết Chia Sẽ

** Đây là liên kết chia sẻ bới cộng đồng người dùng, chúng tôi không chịu trách nhiệm gì về nội dung của các thông tin này. Nếu có liên kết nào không phù hợp xin hãy báo cho admin.