Regex Lookahead: why use \D*?

The challenge asks to match passwords that are greater than 5 characters long and have two consecutive digits (https://learn.freecodecamp.org/javascript-algorithms-and-data-structures/regular-expressions/positive-and-negative-lookahead/ )

The answer should be /(?=\w{5,})(?=\D*\d{2})/ and I understand all of it except why we need the \D* part (it matches any character that is not a digit character (0-9). As I understand, with the second lookahead we’re checking for 2 concecutive digits, then why check for non digit characters?

Thanks!

\D*\d{2} is “zero or more non-digits, followed by two consecutive digits”
The \D is there because you want to allow non-digits in the pattern.

Thanks, @ArielLeslie, but I thought we were allowing non-digits with (?=\w{5,}) already?

\w is all “word characters”: letters, numbers, and underscore.

Ok, so if the task asks for “greater than 5 characters long (and two consecutive digits)“, this means any character, right?
Then, logically, shouldn’t we use \. in (?=\.{5,}) in the first lookahead, and not (?=\w{5,})? I mean, \w is only letters, numbers, and underscore.

The way I see it, this should then cover the “greater than 5 characters long" part of the condition and we shouldn’t even need \D* in the second lookahead.

So then my solution would be: /(?=\.{5,})(?=\d{2})/ But I’m sure I’m wrong, somewhere.

Thanks a million!

You can test it yourself here and see that it doesn’t work.

There are multiple ways to skin this cat.

1 Like

I’m not very good at regexes either, but I’ll try to explain as best as I can from what I understand.

I think I get your question regarding the \D* in the second lookahead that if we’re already checking for \w in the first lookahead, then why we are checking it again in the second one (in a different way)?

That was quite confusing for me too.

The way I consider each lookahead is a like separate regex that must match the string. But in addition to that ALL lookaheads in a regex must match the given string in combination with one another too, not just individually.

So from this perspective, we can explain the questioned regex as follows:

(?=\w{5,}) => Only match if the given string has at least 5 characters or more. (That fairly straight forward, no problem here)

(?=\D*\d{2}) => Only match if the given string has 0 or more non-digit characters AND the string also have 2 consecutive numbers.

So if we were to use only (?=\d{2}), this would translate into this:
Only match if the string has 2 consecutive digits.

That seems fine, it should work? But no, as I said earlier, it passes as an individual regex (if you think or test it separately) but it fails in junction with our first lookahead condition that says the string should have at least 5 characters.

So if we use this (?=\D*\d{2}), it would translate into:
Only match if the given string has Zero or more non-digit characters AND 2 consecutive digits. (By adding the condition of zero of more, we basically satisfy our first lookahead condition)

Now this passes as an individual regex and also in conjunction with our first lookahead.

Another variation which clears the fcc test can be /(?=.{5,})(?=.*\d{2})/.
The first lookahead is 5 or more ANY characters. And the second lookahead is Any zero or more characters AND 2 consecutive digits.

And lastly, yes, there might be a lot of different and weird ways to write this regex so that it passes all the given conditions.

Aɢʜʜ... I ᴅᴏɴ'ᴛ ʟɪᴋᴇ ʀᴇɢᴇxᴇs. I ʜᴏᴘᴇ ɪᴛ ᴍᴀᴅᴇ sᴏᴍᴇ sᴇɴsᴇ ᴛʜᴏᴜɢʜ.


(Edit: .*, \D* etc. in a lookahead (or not, I’m not sure) is something called backtracking. You may google it for more.

This might help a bit:

I am not able to post the link, getting 502 error.
Google this and see the stackoverflow answer:
Chaining multiple positive lookaheads in JavaScript regex

And also Google this: rexegg Regex Quantifier and see the first link, it might help too.
)

4 Likes

Thanks, @husseyexplores, for your very detailed explanation!

1 Like

I’m still interested, though, why we can use \w which is only letters, numbers, and underscore, when what we need is any character (should include things @#$% etc.,) and \w doesn’t allow that…

I’m not sure as I didn’t read the test requirements.
If the test requirements clearly say ‘any character’, then \w wouldn’t be a right answer here.
And if it is working anyway (i.e passing all the tests) even with the condition of ‘any’, then
I would assume that the FCC tests for the challenge are incomplete-ish, and doesn’t check for all the edge cases.

1 Like