How do lookaheads work with other code?

This is indeed a little confusing and I’ll try my best to explain it.

/(?=\w{6})/ by itself will match a string with six alphanumeric characters in a row

and

/(?=\d{2})/ by itself will match a string with 2 consecutive numbers regardless of where those 2 consecutive numbers occur

So why do you need to add \w* to the beginning of the latter when you combine these in order for them to work together properly?

With multiple lookaheads like this, in order for the regex to find a match, both of the lookaheads must be true at the same time. The key here is that the latter will search the string until it finds two consecutive numbers, and it is at this point (where the first of the two numbers is found) that the first lookahead must also be true (i.e. \w{6} is applied to the string starting at the first number).

Let’s take a look at an example without the \w* in the second lookahead.

/(?=\w{6})(?=\d{2})/

The string “11abcd” will be a match because at the point where the two consecutive numbers begin there are also six alphanumeric characters in a row. So both of the lookaheads are satisfied at the same time.

The string “ab11cd” will not match however because at the point where the two consecutive numbers begin there are only four consecutive alphanumeric characters left in the string and thus the first lookahead is not fulfilled. Side note: If we add two more alphanumeric characters to the end of the string (e.g. “ab11cdef”) then it will match again because there are now six alphanumeric characters beginning at the first “1”.

Now let’s add the \w* to the second lookahead and use it on the string “ab11cd”:

/(?=\w{6})(?=\w*\d{2})/

The second lookahead will find “11” as in the examples above. But the \w* means “find 0 or more alphanumeric characters before these two numbers” and thus the second lookahead can gobble up the two alphanumeric characters that precede “11” so they are now included in the second lookahead. In essence, the second lookahead moves its starting point back to the “a” and since there are six alphanumeric characters beginning at “a” the first lookahead is also satisfied.

Basically, each lookahead has the potential to move the starting point at which the other lookahead(s) must also start from.

2 Likes