LEARNING JAVASCRIPT

• If there is a match, the regex consumes all the characters in the match at once;

matching continues with the next character (if the regex is global, which we’ll talk

about later).

This is the general algorithm, and it probably won’t surprise you that the details are

much more complicated. In particular, the algorithm can be aborted early if the regex

can determine that there won’t be a match.
As we move through the specifics of the regex metalanguage, try to keep this algo‐

rithm in mind; imagine your strings being consumed from left to right, one character

at a time, until there are matches, at which point whole matches are consumed at

once.

Alternation

Imagine you have an HTML page stored in a string, and you want to find all tags that

can reference an external resource (

<a>

<area>

<link>

, and

sometimes,

<meta>

). Furthermore, some of the tags may be mixed case (

<Area>

<LINKS>

, etc.). Regular expression alternations can be used to solve this problem:

const

html

'HTML with <a href="/one">one link</a>, and some JavaScript.'

'<script src="stuff.js"></script>'

;

const

matches

html

match

(

/area|a|link|script|source/ig

);

// first attempt

The vertical bar (

) is a regex metacharacter that signals alternation. The

signifies

to ignore case (

) and to search globally (

). Without the

, only the first match would

be returned. This would be read as “find all instances of the text area, a, link, script, or

source, ignoring case.” The astute reader might wonder why we put

area

before

; this

is because regexes evaluate alternations from left to right. In other words, if the string

has an

area

tag in it, it would match the

and then move on. The

is then con‐

sumed, and

rea

would not match anything. So you have to match

area

first, then

;

otherwise,

area

will never match.

If you run this example, you’ll find that you have many unintended matches: the

word link (inside the

<a>

tag), and instances of the letter a that are not an HTML tag,

just a regular part of English. One way to solve this would be to change the regex

/<area|<a|<link|<script|<source/

(angle brackets are not regex metacharac‐

ters), but we’re going to get even more sophisticated still.

Matching HTML

In the previous example, we perform a very common task with regexes: matching

HTML. Even though this is a common task, I must warn you that, while you can gen‐

erally do useful things with HTML using regexes, you cannot parse HTML with

regexes. Parsing means to completely break something down into its component

242 | Chapter 17: Regular Expressions

Tải Sách PDF
LEARNING JAVASCRIPT
Miễn Phí

Sách Mới

THIẾU GIA BỊ BỎ RƠI

1 thg 4, 2024

TÔI LÀ LỐI CỔNG

1 thg 4, 2024

THINK AGAIN - DÁM NGHĨ LẠI

17 thg 3, 2024

Liên Kết Chia Sẽ

** Đây là liên kết chia sẻ bới cộng đồng người dùng, chúng tôi không chịu trách nhiệm gì về nội dung của các thông tin này. Nếu có liên kết nào không phù hợp xin hãy báo cho admin.

LEARNING JAVASCRIPT - Trang 266

Sách Mới

THIẾU GIA BỊ BỎ RƠI

TÔI LÀ LỐI CỔNG

THINK AGAIN - DÁM NGHĨ LẠI

Liên Kết Chia Sẽ

Chủ Đề Sách

Chuyên Mục

Trợ Giúp

Liên Kết