LEARNING JAVASCRIPT - Trang 267

parts. Regexes are capable of parsing regular languages only (hence the name). Regu‐

lar languages are extremely simple, and most often you will be using regexes on more

complex languages. Why the warning, then, if regexes can be used usefully on more

complex languages? Because it’s important to understand the limitations of regexes,

and recognize when you need to use something more powerful. Even though we will

be using regexes to do useful things with HTML, it’s possible to construct HTML that

will defeat our regex. To have a solution that works in 100% of the cases, you would

have to employ a parser. Consider the following example:

const

html

=

'<br> [!CDATA[[<br>]]'

;

const

matches

=

html

.

match

(

/<br>/ig

);

This regex will match twice; however, there is only one true

<br>

tag in this example;

the other matching string is simply non-HTML character data (CDATA). Regexes are

also extremely limited when it comes to matching hierarchical structures (such as an

<a>

tag within a

<p>

tag). The theoretical explanations for these limitations are

beyond the scope of this book, but the takeaway is this: if you’re struggling to make a

regex to match something very complicated (such as HTML), consider that a regex

simply might not be the right tool.

Character Sets

Character sets provide a compact way to represent alternation of a single character

(we will combine it with repetition later, and see how we can extend this to multiple

characters). Let’s say, for example, you wanted to find all the numbers in a string. You

could use alternation:

const

beer99

=

"99 bottles of beer on the wall "

+

"take 1 down and pass it around -- "

+

"98 bottles of beer on the wall."

;

const

matches

=

beer99

.

match

(

/0|1|2|3|4|5|6|7|8|9/g

);

How tedious! And what if we wanted to match not numbers but letters? Numbers and

letters? Lastly, what if you wanted to match everything that’s not a number? That’s

where character sets come in. At their simplest, they provide a more compact way of

representing single-digit alternation. Even better, they allow you to specify ranges.

Here’s how we might rewrite the preceding:

const

m1

=

beer99

.

match

(

/[0123456789]/g

);

// okay

const

m2

=

beer99

.

match

(

/[0-9]/g

);

// better!

You can even combine ranges. Here’s how we would match letters, numbers, and

some miscellaneous punctuation (this will match everything in our original string

except whitespace):

const

match

=

beer99

.

match

(

/[\-0-9a-z.]/ig

);

Character Sets | 243

Liên Kết Chia Sẽ

** Đây là liên kết chia sẻ bới cộng đồng người dùng, chúng tôi không chịu trách nhiệm gì về nội dung của các thông tin này. Nếu có liên kết nào không phù hợp xin hãy báo cho admin.