So anytime you see backslash ‘\’ in regex, it means either the character that follows gains a special meaning, (e.g. \w means match [A-Za-z0-9_]) or the character that follows should be interpreted literally. (e.g. . means instead of matching any character, match the character ‘.’ literally)
Here \s means match a space ’ ’ character.
And \1 means match the first capture group.
Now let’s talk about what capture groups are.
Let’s say you want to match some sort of repetitive pattern in a string. e.g. “123 123 123”. You don’t want to match the generic string of 3 numbers, space, 3 numbers, space, and 3 numbers, then you’d get random things like 234 124 333.
You want to somehow make sure that all 3 groups of numbers are exactly the same. You can use capture groups. The syntax is simple, just write the regex that would match your capture group, then place parenthesis around it.
They behave much like variables in JS, you can store a certain matched string in a capture group, then used that stored value to compare against chars that comes up later in the string.
So let’s say you want to match 2 consecutive, duplicate letters, e.g. “ww”. If you do /[A-Z]{2}/i
that would match things like “ow”, “aw” and whatever. Instead, you can capture the first letter, then use it to compare against the second letter.
So per the syntax explained above, you just write how you think you can match the first character. That’s simple right? To match a letter, its /[A-Za-z]/
. Now wrap a parenthesis around it, you got /([A-Za-z])/
, which would capture that letter in addition to matching it.
Well how do you tell JS when to use that capture group for comparison? its the \1
from earlier. Wherever you place \1
in the regex, is where JS would attempt to find an exact match to your first capture group.
Since your desired match is 2 consecutive duplicate letters, the comparison should take place immediately after you get your capture group. So your regex becomes /([a-zA-Z])\1/
But what if we have multiple repetitions we want to match? That’s why the syntax is \1
for first capture group. You can have as many capture groups as you want, and the system will name each from left to right. e.g. in the regex: /(\w)(\s)/
, the first capture group \1
would be (\w)
, second capture group \2
would be (\s)
.
As for the example regex given by FCC, now we can break it down and see what it actually is doing.
/(\w+)\s\1/
:
(\w+)
: first capture group, captures a length of chars more than 1 char in length that is composed of [a-zA-Z0-9_]
\s
matches a space character
\1
matches the first capture group again.
So this would test true for any two duplicate words separated by space, such as “regex regex”.
Play around with regex at https://regex101.com/, regex is a fairly complex topic that can take a while, but learning by practice can help speed that process up.