[regex] Why don't these 2 expressions do the same thing?


The second statement above selects all leading and trailing whitespace in a string; however, the first statement only selects all leading whitespace in a string.
It seems to me like the first statement should do the same thing since the \1 is simply backreferencing the \s+

I have figured out why, so for those curious here is the reason:

We start by using the match function for demonstration.
" Hello World! " is the sample string, it contains 3 spaces in the beginning and end.
The match function will return an array like this: [ ' ', ' ' ] if you use the second regex expression that I provided (the one that works); however, the first regex statement that I provided will return the following on the other hand: [ ' ', '' ]. The reason for this is all in the /g flag.

The Operations that are Happening
The /g flag means that it searches the string multiple times with the provided regex statement until it reaches the end of the string. The result of this is that it searches once at the beginning to see if the string matches the provided regex statement, since it does match, it outputs it in the first array slot. Then it continues to search, but also resets the “(\s+)” group in the beginning to be empty. This means that when it reaches the end, the \1 has an empty string in it that needs to occur at the end due to the $ specifier. So it just matches “nothing” as in an empty string, that occurs at the end between the final whitespace character and the end of the string. This then returns an empty string for the second match.

Additional Insight to Backreferencing
When you create a grouping with the () that you then backreference, the backreference does not contain the “\s+”, but rather just the whitespace that the “\s+” found. So basically, even if this theoretically didn’t reset what the (\s+) grouping found, it would only be able to extract the same whitespace from the end that it had in the beginning. For example: if there were 3 spaces in the beginning and 4 at the end, it would only find the last 3 spaces. Another case is if there are 3 spaces in the beginning and 2 at the end, it would not find anything since there aren’t at least 3 spaces between the last character and the end of the string for it to match to.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.