Parsing HTML with regular expressions

I’m having difficulty parsing (understanding) this sentence:

“Parsing HTML with regular expressions should be avoided, but pattern matching an HTML string with regular expressions is completely fine.”

Excerpted from the Challenge: Regular Expressions: Find Characters with Lazy Matching

1 Like

We where actually talking about this and came to the next:

Pass says:
They mean we should only use regexes on Html string purposely to match strings.
But we should not match the actual html code in regexes

Like <h1> My Name is John </h1>

I said:
i understand it quite diffrend: taking in HTML code and extracting relevant information like the title of the page, paragraphs in the page, headings in the page, links, bold text etc should be avoided with the expection of pattern matching of an HTML string with a regular experssion