Find Characters with Lazy Matching help

please help me out with what to do

Your code so far


let text = "<h1>Winter is coming</h1>";
let myRegex = /h[1-10]/?gi;// Change this line
let result = text.match(myRegex);

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36.

Link to the challenge:
https://learn.freecodecamp.org/javascript-algorithms-and-data-structures/regular-expressions/find-characters-with-lazy-matching

‘?’ shouldn’t be out side / / only flags are allow i.e g for global i to ignore case and m for multiline
? means the preceding character is optional.

http://regextester.com/

visit this site to experiment with regEx
Tell me more about what you want to do with this regExp so i can help you more.

i want to match h1 tags

You can simply write this

/h1/

What the challenge wants

Your current regular expression is

/h[1-10]/?gi

Let’s note that the challenge is not asking you to match only h1 tags (though it sure seems like it does), instead, it’s asking you to build a general regular expression for parsing an HTML tag.

So, it’s asking you to be able to match <a>, and <p>, and <strong>, and so on.

The naive way

You might think, there are three components to a tag in this challenge

  1. The start, <
  2. The middle, a series of any characters
  3. The end, >

So you might quickly jump in and write a regular expression.

/<.*>/gi
 ↑^^↑

I’ve marked the start and the end out with ↑s. I’ve marked the middle with ^. Now, let’s try this regex out on a seemingly innocent string,

<h1>What is your name</h1>

Now, let’s try and fill in our beginning, middle, and end.

  1. <
  2. h1>What is your name</h1
  3. >

Of course! . will match any character, and > counts as a character. The regular expression engine will try and get our middle to be as long as possible

It’s being greedy, capturing as many characters as possible while still fulfilling our expression. But which character is so rude as to be greedy? It’s *, which is “Match 0 or more”. We still want it to “Match 0 or more”, but we want it to tone down, be as short as possible. We want it to be lazy, so that as soon as it can stop matching characters, it will.

How to make things lazy

So at the moment, our middle section looks like .*. To tell the * to be lazy, we add a ? to just after it, making our middle section be .*?

You placed the ? into the bit with the “flags” – these tell the entire regular expression how to act. In another universe, I could actually imagine there being a ? flag, which would automatically treat every qualifier as lazy rather than greedy. Sadly, it doesn’t exist yet.

So, you need to put it after every single qualifier you want it to act upon.

That makes our final regular expression: /<.*?>/gi

Alternatives

Of course, just because this challenge is testing you on using the ?, it doesn’t mean in real life you’re obligated to use it. Another way to match for just one tag is to specify what characters are allowed in our middle section; rather than asking the Regular Expression engine to just make it as short as possible, Our conditions would then be

  1. Start <
  2. Any character EXCEPT FOR > 1 or more times
  3. >

And how can we code those in?

  1. <
  2. [^>]*
  3. >

That’s because [^a] means any character EXCEPT a, so [^>] means anything except for a closing tag.

So the final alternative solution to this problem would be /<[^>]*>/gi. But for this challenge, do not use this alternative solution. This particular challenge is trying to teach you about ?, so in your solution, you should use ?. I just thought that it would be nice to show that in this case, you don’t need it – though it might make it a little easier to write regular expressions.

Closing

I wish you luck in solving this challenge, and feel free to ask if you have any questions!

2 Likes

Can you tell me difference b/w these symbol in regExp

  • ? + .

Sure!

Plus (+)

This means “one or more”. Let’s say you have the regular expression

/set+/

The bit we’re interested is t+, which will match one or more t characters.

:white_check_mark: Matches

  • set
  • sett
  • settt
  • setttt

:negative_squared_cross_mark: Does not match

  • se

Dot (.)

This character will match any single character (except for a newline). Let’s say we have the below regular expression.

/.at/

We can substitute any character in for the dot. Even an actual dot!

:white_check_mark: Matches

  • bat
  • cat
  • 2at
  • _at
  • .at
  • rat
  • Fat
  • {at

:negative_squared_cross_mark: Does not match

  • at
  • (newline character)at

When I say “newline character”, I mean the character that is entered when you hit the Enter key, \n.

However, if you use the dot in a character class (e.g. /[.]/) then it will match as an actual dot (the punctuation mark), rather than any character.

Question mark (?)

This character has a few different meanings.

After something to match

This character means “0 or 1 times”. You use it just like the plus sign.

set?

will match either a t, or no t.

:white_check_mark: Matches

  • se
  • set

:negative_squared_cross_mark: Does not match

  • sett
  • settt

After a qualifier

If you use the ? just after something that says how many times a character should appear, it makes the match lazy.

Let’s take a look at its use with the examples above.

u.+p

This matches for

  • The letter u
  • 1 or more characters
  • Then the letter p.

So, let’s try it against the below string:

“subpopular”

Here is how it would match:

subpopular (ubpop)

But if we added a ?, here is what the regular expression would look like:

/u.+?p/

And here is what the result would look like

subpopular (ubp)

Note that they both fit the rule we described, but when we put a ? it makes the resulting matched text as short as possible.

Other meanings

The question mark has other meanings, especially when it’s directly after an opening round bracket (

2 Likes

I passed. No idea what’s the instruction for.

image

1 Like