Need help estimating the performance of this algorithm

gaac510 · May 14, 2021, 2:05am

The below function is for finding the minimum number of substrings a string (containing only lower case letters; min length 1) can be split into where each substring contains no repeated letters. I thought the function should have both space and time complexities of O(N) when S becomes significant in length, but somehow, after passing all correctness tests, it failed a performance test (for a pre-interview coding test).

Just wanted to see if people agree with my estimation of the function’s space and time complexities?

Also, could there an O(log N) solution to this challenge?

To further explain the challenge, for example, 'abab' can be split into

4 substrings 'a' 'b' 'a' 'b' where each substring doesn’t contain repeated letters; or
3 substrings 'ab' 'a' 'b' / 'a' 'ba' 'b' / 'a' 'b' 'ab' where each substring doesn’t contain repeated letters; or
2 substrings 'ab' 'ab' where each substring also doesn’t contain repeated letters.

In this example case, the function should return 2.

function solution(S) {
  let m = {}; // m for memory. Property count shouldn't exceed 26.
  let a = 1; // a for answer.
  for(let i = 0; i < S.length; i++) { // Loop through every character of S.
    let c = S[i]; // c for character.
    if(m[c] === undefined) m[c] = true // Remember a never seen character.
    else { // Else if has seen the character...
      a++; // ... start a new substring, and...
      m = {}; // ... clear memory, and...
      m[c] = true; // Remember the character being checked.
    }
  }
  return a;
}

bbsmooth · May 14, 2021, 3:54am

Your solution is definitely linear time. Did they give you any details as to why it failed the performance test? I’m not seeing any way to make this better than O(n), but that could be because I’m just not that creative.

As for space usage, I’m kind of thinking it is less than O(n). The maximum key/value pairs you will ever have at any time is 26, no matter how big the input is, so the space used is not dependent on the input size. That sort of sounds like constant space to me, but I could be wrong.

gaac510 · May 14, 2021, 4:00am

They didn’t and that’s why it frustrates me so much

bbsmooth · May 14, 2021, 4:05am

Maybe “performance” covers a lot more than just runtime? Is this the exact code you turned in? If so, perhaps they didn’t like the variable names you chose, or the verbosity of your comments? I’m just guessing here.

let m = {}; // m for memory.
let a = 1; // a for answer.
let c = S[i]; // c for character.

These variable should probably be properly named to begin with.

Again, I’m just guessing here because I don’t see how the runtime could get any better.

gaac510 · May 14, 2021, 4:12am

Yeah I pretty much named everything the same way.

I doubt it’s my variable names though. The performance testing is done programmatically without human involvement. Also, the testing platform has their own openly available challenges and they show how they test for performances. They certainly didn’t bother with naming conventions with those open challenges. Plus, the function parameter S was pre-written by them and totally doesn’t follow JS naming convention.

Hmm… I’m considering writing an email to them in case it’s their server issue…

jenovs · May 14, 2021, 6:11am

Only thing I can think of is to use a string instead of an object to remember seen characters (or even just remember indexes of substring in the original string).

Making it less than linear time seems unlikely (technically you could skip check if m already has all letters, because the next letter will be duplicate, but this seems very minor optimization).

JeremyLT · May 14, 2021, 4:04pm

For something this small, arrays will be much faster than objects. Indexing into a fixed length array is cheaper than hashing into an object. The exact breakpoint for when hashing is cheaper than indexing depends upon language and hardware, but it is much larger than you think.

Also, dumping and reallocating the m object will incur memory overhead.

function solution(S) {
  const seenInSubstring = Array(26).fill(false);
  let numSubstrings = 1;
  for (let i = 0; i < S.length; i++) {
    const currCharCode = S[i].charCodeAt(0) - 'a'.charCodeAt(0);
    if (seenInSubstring[currCharCode]) {
      numSubstrings++;
      for (let j in seenInSubstring) seenInSubstring[j] = false;
    }
    seenInSubstring[currCharCode] = true;
  }
  return numSubstrings;
}

Also, single letter variable names are bad. Don’t do bad things

gaac510 · May 14, 2021, 11:32pm

Will forEach() or fill() be better, worse or similar performance to reset the array to all false?

JeremyLT · May 14, 2021, 11:33pm

I wouldn’t imagine there is a substantial difference.

kevinSmith · May 15, 2021, 12:14am

I would say that it performance would always be about runtime and memory. But when we talk about performance, we tend to focus on time complexity instead of runtime. Sure, that makes sense.

But most test don’t test complexity, they test runtime. This is important to consider because while two O(n) algorithms might be the same, not all implementations will end up with the absolute runtime. Depending on what operations are chosen, each iteration can take much longer. Jeremy points out a good one.

bbsmooth · May 15, 2021, 12:22am

Speaking of performance, because I was a little bored and a little curious, I decided to test both of your algorithms against a variety of inputs. Long story short, @JeremyLT’s algorithm was at least four times faster for inputs in which there were not a lot of long sequences of the same letter. But as the input gradually had longer runs of the same letter, @gaac510’s algorithm started to catch up and then eventually overtake @JeremyLT’s. At the extreme, testing a string of one million a's, @gaac510’s algorithm ran approximately twenty times faster. But then as you break up those long sequences, @JeremyLT’s algorithm will start to run faster. Testing strings of one million randomly placed letters, @JeremyLT’s algorithm ran just over 4.5 times faster.

So unless you knew ahead of time that the nature of the input is going to include very long sequences of the same letter I think it is safer to assume random placement and thus @JeremyLT’s algorithm is faster.

JeremyLT · May 15, 2021, 12:24am

We could probably squeeze this lemon a bit more by using bits instead of an array.

Hmm, now I’m wondering if some engines make fill more efficient than my manual loop…

Looking around, probably using a raw for (let i=0; i<26; i++) would be fastest I bet. Though, JS isn’t really the right language for speed

gaac510 · May 15, 2021, 12:43am

Thank you for this. I was just planning on doing some benchmarking myself

Based on my experience with their open challenges, I imagine they’d test all kinds of cases, strings as long as the server allows, big random strings, big repeating strings etc… I wouldn’t be surprised if they tested for all of these and then some.

gaac510 · May 15, 2021, 1:06am

Come to think of it, I seem to remember the challenge did specifically mention somewhere that ‘‘you can assume all characters are ASCII compliant’’. I’m realising this could be a hint for using charCodeAt() indeed, but could it also be hinting at something more?

bbsmooth · May 15, 2021, 5:03am

On my setup (Linux Fedora 32/node v12.22.1) it made a dramatic improvement. On a string with a million a’s it went from an avg. execution time of 890ms down to 75ms. @gaac510’s algorithm had an average runtime of 45ms on this string, so the advantage for strings with long sequences of the same letter is almost gone.

JeremyLT · May 15, 2021, 5:12am

How’s it compare to the for i=0 variant?

bbsmooth · May 15, 2021, 5:43am

Using bit manipulation instead of an array to track whether we’ve seen a letter in the string is even better. For the million a's string, as noted in my previous post, the avg runtime was around 75ms using @JeremyLT’s algorithm with Array fill() . @JeremyLT’s algorithm with bit manipulation is around 7ms, completely blowing all the others out of the water.

With a 26 character string, no repeating letters, executing the function one million times, @JeremyLT’s algorithm with Array fill has an avg. run time of 315ms while the bit manipulation version is 101ms.

For a million character string randomly filled with letters, @JeremyLT’s algorithm with Array fill ran an avg. of 17ms while the bit manipulation version ran an average of 5ms.

So there you go @gaac510, that’s probably why you didn’t pass the performance part, you didn’t use bit manipulation to track the letters you have seen in the string

On a serious note, have you ever worked with bit manipulation before? If not, this might be a fun exercise to play with it. You already have a correct algorithm to compare against.

JeremyLT · May 15, 2021, 5:45am

It’s worth at least learning how bit manipulation works. It comes in handy at the strangest of times and it’s a good mental exercise.

Thanks for the detailed comparisons @bbsmooth!

bbsmooth · May 15, 2021, 5:49am

To be honest, it had been a while since I’ve done anything with bits, so it was a nice refresher. Also, I had never thought about the speed hit that occurred by hashing with an object vs. array indexing, so thanks for opening my eyes to that.

gaac510 · May 15, 2021, 5:50am

Thank you so much @bbsmooth @JeremyLT !

Not at all. I’ll go learn about it now🤓

PS What’s special about fill that would result in the performance improvements?