Regular Expression: matching capture groups

Any regex gurus who can lend a hand?

const regex = /^(?:\/)(first1(?:\/middle)|first2)(?:\/)(second1|second2)/
const path = '/first1/middle/second2/abc/xyz';
let matches = path.match(regex);

I am struggling with the above regex, because I don’t practice them enough. My goal is to have matches be an array which looks like:

[ ‘first1’, ‘second2’ ]

So far, I have had no luck and I have tried at least 10 other regular expressions before this one.

You may try this way:

const regex = /^(?:\/)(first[1|2])(?:\/middle\/)(second[1|2])/;
const path = '/first1/middle/second2/abc/xyz';
let matches = regex.exec(path);

console.log(matches);      // ["/first1/middle/second2", "first1", "second2"]
console.log(matches[0]);   // /first1/middle/second2 
console.log(matches[1]);   // first1
console.log(matches[2]);   // second2
1 Like
/^\/(first\d)\/\w+?\/(second\d)/

You’re using a non capture group with “middle” inside a capture group, so its been captured in \1.

Also [1|2] doesn’t make sense, because it is a character set, which means it will match 1, | or 2. That’s a good place for a non capturing group, like (?:1|2) or even better [12].

I just simplified first1|first2 with first[1|2] just for literal matching.

You’re matching “first|”, there’s no or inside a character set.

Thanks @ghukahr and @picklu for attempting this regex. I think I need to further explain what first1 and first2 are. I did not really mean for the 1 and 2 to be digits per say.

The words before the second / can be red, blue, or green and the words after /middle/ can be one, two, three, four, or five.

The word middle needs to be middle. See below for a better example of what I am trying to achieve

const regex = ???
const path1 = '/red/middle/twoiabc/xyz';
const matches1 = regex.exec(path1);
console.log(matches) // should return [ 'red', 'two' ]

const path2 = '/blue/middle/three/mnso/jgsl';
const matches2 = regex.exec(path2);
console.log(matches2) // should return [ 'blue', 'three' ]

Also, there is one additional requirement I did not mention the first time which I should have. If the word before the second / is ‘yellow’, then I need to capture only the array [ ‘yellow’ ] like in the below path:

const path3 = '/yellow/blah/mnso/jgsl';
const matches3 = regex.exec(path3);
console.log(matches3) // should return [ 'yellow' ]

You have to replace first and second with \w+, if middle needs to be middle then you can match it literally.

/^\/(\w+)\/middle\/(\w+)\//

It can not just be any word (\w+) it has to be one of the 4 colors (red, blue, green or yellow) for the word before the second /.

Hm, then it won’t look good, but if you need to be this specific you can use as many or as you want.

/^\/(red|blue|green|yellow)\/middle\/(one|two|three|four|five)\//

I see that yellow requirement just now…

@RandellDawson, if I understand correctly you can split path with ('/middle/') and then map result with 2 regex:

  1. /(color options)$/
  2. /^(number options)/

…and prior to that, check if path contains ‘yellow’.
If you do in one regex, it’s possible but it would be messy and super super super slow, man - definitely not worthy

I also think it would be better to use non-regex for this, since it would be clearer.

In any case:

^\/(red|blue|green|yellow)(?:(?:(?<=yellow)\/\w+\/)|\/middle\/(one|two|three|four|five))

As @snigo suggested, you may better split the path even with regex.
if first is one of four colors (red, blue, green, and yellow) and second is one of [one, two, three, four, and five] then your may try to split the path using regex like this:

const regex = /^\/(red|green|blue|yellow)\/middle\/(one|two|three|four|five)/
let path1 = '/red/middle/two/abc/xyz';
"/red/middle/two/abc/xyz"
let splittedPath = path1.split(regex);
console.log(splittedPath); // ["", "red", "two", "/abc/xyz"]

@RandellDawson I just want to add that by reading your first post regex I can see what you’re trying to do, and it was pretty close I think. But since your requirements are basically two different regex, you should’ve separated them completely instead of only the first group.

Basically your first regex modified:

^\/(?:(red|blue|green)\/middle\/(one|two|second|three))|(?:(yellow)\/.*)

The thing with this is that when yellow is found, it will return undefined for the first two groups, example:

  • ‘/red/middle/twoiabc/xyz’ -> ["/red/middle/two", “red”, “two”, undefined]
  • ‘/blue/middle/three/mnso/jgsl’ -> ["/blue/middle/three", “blue”, “three”, undefined]
  • ‘/yellow/blah/mnso/jgsl’ -> [“yellow/blah/mnso/jgsl”, undefined, undefined, “yellow”]

The solution I posted above using positive lookbehind returns undefined only for the last item.

/^\/(red|blue|green|yellow)(?:(?:(?<=yellow)\/\w+\/)|\/middle\/(one|two|three|four|five))/

Example:

  • ‘/red/middle/twoiabc/xyz’ -> ["/red/middle/two", “red”, “two”]
  • ‘/blue/middle/three/mnso/jgsl’ -> ["/blue/middle/three", “blue”, “three”]
  • ‘/yellow/blah/mnso/jgsl’ -> ["/yellow/blah/", “yellow”, undefined]

Of course you can solve it easily by filtering it.

Actually regex solution is absolutely fine as well:
/^(?:.*\/(yellow)|.*\/(red|green|blue)\/middle\/(one|two|three|four|five))/

Thanks everyone. Based on everyone’s input, I was able to figure out a solution.