Difference betwen + and * in regular expressions

Hello everyone, I’m finding a bit hard to understand the difference between + and * in regular expressions.
I would be very happy if somebody could make it a bit clearer for me.
Thanks in advance.

1 Like

Do you have a specific example of an expression being used with + and/or * which you do not understand the results?

I don’t really have an example but I know + is for one or more and * is for zero or more. That is the part that I don’t get.

1 Like

Let’s say we have the following string.

var str = "abc";

If I want to test if str has zero or more digits (any number 0 through 9), then I would write:

/[0-9]*/.test(str)

The above would evaluate to true, because there are no digits at all which satisfies the need to find zero or more digits.

If I want to test if str has 1 or more digits, then I would write:

/[0-9]+/.test(str)

The above would evaluate to false, because there are not 1 or more characters in str.

3 Likes

Thanks a lot. Can you use this example to make it more clearer?

// example crowd gathering
let crowd = 'P1P2P3P4P5P6CCCP7P8P9';
let reCriminals = /C+/; // Change this line
let matchedCriminals = crowd.match(reCriminals);
console.log(matchedCriminals);

why is matchedCriminals = ["CCC"] when /C+/ is used
but when /C*/ is used matchedCriminals = [ "" ]

It might be easier to understand what is happening if we add the global flag β€˜g’ on the end of the regular expression. I am going to shorten the string to make it easier to display what I want to show you. The global flag when used with match, will find all matches instead of just the first instance.

let crowd = 'P1P2P3P4P5P6CCCP7P8P9';
let reCriminals = /C+/g; // Change this line
let matchedCriminals = crowd.match(reCriminals);
console.log(matchedCriminals);

The above displays [β€˜CCC’], because it is the only instance of finding one or more consecutive C characters.

If you were to use /C*/g it would return [ β€˜β€™, β€˜β€™, β€˜β€™, β€˜β€™, β€˜CCC’, β€˜β€™, β€˜β€™, β€˜β€™, β€˜β€™, β€˜β€™ ]

Why? Because as the expression is evaluated over the entire string, it finds 10 instances of zero or more C character. The first instance is the letter β€˜P’. There are zero or more C characters of β€˜P’. This happens 3 more times and then we get to β€˜CCC’ which is also zero or more C characters and 4 more instances of β€˜β€™ because there a no more C characters.

Now, if you drop the global flag β€˜g’ like in your original example, we will get [ β€˜β€™, index: 0, input: β€˜P1P2CCCP3P4’ ]. The β€˜β€™, is the first instance of zero or more C characters.

2 Likes

Everything is much clearer now.
Thanks a lot for your time.:sweat_smile::sweat_smile: