Regex capture group help

I’m trying to understand regular expression capture groups.

if I match “2009009100900910090091” against /(\d+)\1+/,

capture group 2 matches “0090091”, but not “0”, “009”, “090”, “900”, “0900910”, “9009100”, “0091009”, “0910090”, “9100900”, or “1009009”… all of which are subsets of the string I would expect to fit the regex’s second capture group. Why does it capture the one, and not the others?

I really like regular expressions, for some reason they never behave the way you think they should :sweat_smile:

I’m not an expert, but I think it’s just that in “\d+” the “+” is greedy, it won’t stop until it encounters something that doesn’t match “\d” (just “\d+” would match the entire string) and a regex will always return the first possible match. So what I understand is that it’s just matching the first biggest series of digit that is repeated more than one time.

1 Like

I agree with this analysis. To match less, I’d try to use a ? after the +, or to limit the number of digits to match using {n[, n]} after the \d. It depends on what you’re trying to achieve.

Thanks for the answers. They are helpful in understanding regex, though I wish I could feel like that understanding brought be closer to knowing how to do one of the things I was trying to figure out how to do:

Given a string of unknown length that includes (but is not necessarily limited to) repeating characters, I’d like to find (a) the earliest, largest, consecutively repeating subset of characters (of unknown length) that is (b) not also a superset of one of the larger string’s repeating subsets

As far as I can tell, my regex, /(\d+)\1+/, meets condition (a), but not condition (b).

Example:
/(\d+)\1+/ matches “0090091” in “2009009100900910090091” (good)
but also matches “00900910090091” in “20090091009009100900910090091” (bad=> fails (b))

This is for a problem I’ve sufficiently solved another way (after many hours of head-banging), but it would be really helpful to know if it were possible to do with regex. As far as I’m aware, regex has logical INCLUSIVE OR operators (pipe), but no other logical operators such as an AND that would require something to match both of two conditions. Is that correct?

Could it be the case that it is impossible to do this match with only regex?

Yo. I just did a RegExp course through CodeSchool.com, which I would recommend if you want to practice them a bit, although there appears to be a total lack of other practice sites to keep the syntax fresh in your mind with real world use cases.

Anyway; what are you searching for / validating and why needs to be more specific… It’s easier to solve these things when the pattern you’re searching for is known.

This captures the 3 groups your interested in = (\d+)(\1)+(\1)+
I believe it doesn’t capture the 2, as your specifically searching for repeated number groups, so it finds the 1st repeated number group. It also looks like you need to specify each capture group like I have above. (\d+)(\1)+ only yields 2 capture groups beyond the full match.

This also captures the 3 groups you’re interested in by setting the char min/max = /([0,9,1]{7,7}+)/ || /(0\d{6,6}+)/

Thanks for the response

I’m not sure what you mean by this.

I don’t understand why the capture groups around the backreferences are useful here… as far as I can tell, all of them will always return the same thing.

Also, this regexp runs into the same problem I described in the comment above… it fails condition (b): when the string contains the repeating substring 6 or more times, the group captures a superset of the string I’m looking for.

This will not work, as I do not know the characters that are in the input string or repeating subset, nor the length of the string or repeating subset.

go here - https://regex101.com/

Test (\d+)(\1)+(\1)+ OR ([0,9,1]{7,7}+) OR (0\d{6,6}+) against 2009009100900910090091 and you’ll see what I mean.

what are you searching for / validating and why needs to be more specific… It’s easier to solve these things when the pattern you’re searching for is known.

Be more specific about what you’re trying to solve. You need a problem to solve before you can find a solution. eg. you need a solution that finds repeating numbers and then captures every instance of those repeating numbers in any size number…

1 Like