What the challenge wants
Your current regular expression is
/h[1-10]/?gi
Letâs note that the challenge is not asking you to match only h1
tags (though it sure seems like it does), instead, itâs asking you to build a general regular expression for parsing an HTML tag.
So, itâs asking you to be able to match <a>
, and <p>
, and <strong>
, and so on.
The naive way
You might think, there are three components to a tag in this challenge
- The start,
<
- The middle, a series of any characters
- The end,
>
So you might quickly jump in and write a regular expression.
/<.*>/gi
â^^â
Iâve marked the start and the end out with âs. Iâve marked the middle with ^. Now, letâs try this regex out on a seemingly innocent string,
<h1>What is your name</h1>
Now, letâs try and fill in our beginning, middle, and end.
<
h1>What is your name</h1
>
Of course! .
will match any character, and >
counts as a character. The regular expression engine will try and get our middle to be as long as possible
Itâs being greedy, capturing as many characters as possible while still fulfilling our expression. But which character is so rude as to be greedy? Itâs *
, which is âMatch 0 or moreâ. We still want it to âMatch 0 or moreâ, but we want it to tone down, be as short as possible. We want it to be lazy, so that as soon as it can stop matching characters, it will.
How to make things lazy
So at the moment, our middle section looks like .*
. To tell the *
to be lazy, we add a ?
to just after it, making our middle section be .*?
You placed the ?
into the bit with the âflagsâ â these tell the entire regular expression how to act. In another universe, I could actually imagine there being a ?
flag, which would automatically treat every qualifier as lazy rather than greedy. Sadly, it doesnât exist yet.
So, you need to put it after every single qualifier you want it to act upon.
That makes our final regular expression: /<.*?>/gi
Alternatives
Of course, just because this challenge is testing you on using the ?
, it doesnât mean in real life youâre obligated to use it. Another way to match for just one tag is to specify what characters are allowed in our middle section; rather than asking the Regular Expression engine to just make it as short as possible, Our conditions would then be
- Start
<
- Any character EXCEPT FOR
>
1 or more times
>
And how can we code those in?
<
[^>]*
>
Thatâs because [^a]
means any character EXCEPT a
, so [^>]
means anything except for a closing tag.
So the final alternative solution to this problem would be /<[^>]*>/gi
. But for this challenge, do not use this alternative solution. This particular challenge is trying to teach you about ?
, so in your solution, you should use ?
. I just thought that it would be nice to show that in this case, you donât need it â though it might make it a little easier to write regular expressions.
Closing
I wish you luck in solving this challenge, and feel free to ask if you have any questions!