Real job interview question, find all occurances of a string

kravmaguy · July 26, 2022, 5:51pm

I got lucky and had a technical interview with recruiting company that was recruiting for a local company hiring for PHP developer which I failed, I was lucky to make it to a technical interview normally I dont get them.

The problem sets in this [backend interview] (ive included my solutions)(Technical-interview/Backend interview at master · KravMaguy/Technical-interview · GitHub) were all easy except the last one, which I could not do:

for the example ignore the part about the padding with the digits… and the lines of the file (they are just individual haystacks)
what Ive done:

to get the number or occurances create hashmap of letter occurances and their indexes and multiply length of each letter array (wont work because the letters of the needle in the haystack need to be in order)

e.g. for a needle “ABCDE” A must come before B which much come before C which must come before D which must come before E etc…

I was told to use a queue, I did not invent his solution:

while (queue.length > 0) {
    const [val, idx] = queue.shift();
    if (idx === str.length - 1) {
      count++;
    } else {
      const arr = hashMap[str[idx + 1]];
      for (let i = 0; i < arr.length; i++) {
        if (val < arr[i]) {
          queue.push([arr[i], idx + 1]);
        }
      }
    }
  }

but anyways it didnt work with large inputs the queue is becoming massive for every single one removed from it. In some of the test cases in the input file there is an astronomical number of possibilites.

Here is what I think to do, I need your help to implement it:

consider inputs

hayStack: “xBxxAxxBxCxBxAxBxC”
needle: “ABC”

I will look for the first occurance of A followed by whatever occurance of B that comes after the A followed by whatever occurance of C that comes after the B. now that ive found the first occurance I will iterate through all the C’s that come after that C until no more C’s.

I will then work backwards so I will proceed to the next B than look for all the C’s until there are no more, then go to the next B repeat look for all the C’s until there are no more, advance to the next B and look for all the C’s until no more C’s, repeat this process until there are no more B’s

Now im back to the first A, so I will advance to the second A and do the same process I described above until no more A’s

for the sample inputs above we have keys for each letter and values being arrays of indeces e.g.:

[ 'A', [ 4, 13 ] ]
[ 'B', [ 1, 7, 11, 15 ] ]
[ 'C', [ 9, 17 ] ]

what im seeing is it needs to pop twice each time it gets to the end of a given lettersArray.
can you please describe this process, goes without saying it needs to work with any length of inputs.:

//[A, B, C]

[4 ,7, 9 ]  +1
[4, 7,   ] pop(9)
[4, 7, 17] +1 push(17)
[4, ,  ] pop(17) pop(7)
[4, 11,  ] push(11)
[4, 11, 17]  +1 //can not add  9 it is less than 11, push(17)
[4, ,  ]  pop(17) pop(11)
[4, 15, ] push(15)
[4, 15, 17] +1 //can not add 9, push(17)
[ , , ] pop(17) pop(15) pop(4)
[13, 15, 17 ] +1

//5 possibilities
//5 possibilities

I wrote some code but its not correct.

function findNumOccurances(sequence, str) {
  const stack = [];
  const hashMap = [...sequence].reduce((acc, letter, idx) => {
    acc.hasOwnProperty(letter) ? acc[letter].push(idx) : (acc[letter] = [idx]);
    return acc;
  }, {});
  for (let i = 0; i < str.length; i++) {
    const letter = str[i];
    const arr = hashMap[letter];
    for (let j = 0; j < arr.length; j++) {
      const val = arr[j];
      if (stack.length < 1) {
        stack.push([val, j, letter]);
        break;
      } else {
        if (val > stack[stack.length - 1][0]) {
          stack.push([val, j, letter]);
          break;
        }
      }
    }
  }

  const stackCombos = [];
  while (stack.length > 0) {
    if (stack.length === str.length) {
      stackCombos.push([...stack]);
    }
    const [val, index, key] = stack.pop();
    const lettersArray = hashMap[key];
    if(index<lettersArray.length-1){
    stack.push([lettersArray[index + 1], index + 1, key]);
    } else {
      const popped=stack.pop()
      if(!popped){continue}
      const [poppedVal, poppedIdx, poppedKey]= popped
      if(poppedIdx<hashMap[poppedKey].length-1){
        const pushedKey=[hashMap[poppedKey][poppedIdx+1],poppedIdx+1, poppedKey]
        stack.push(pushedKey)
        while(stack.length<str.length){
          for(let j=0;j<lettersArray.length;j++){
            if(lettersArray[j]>pushedKey[0]){
              stack.push([lettersArray[j], j, key])
              break
            }
          }
          break
        }
      }
    }
    console.log({stack})
  }
  console.log('-------------------------------------')
  console.log('possibilities \n',stackCombos)
}

const testCases = ["xBxxAxxBxCxBxAxBxC"];
//                ["012345678901234567890"]

const str = "ABC";

for (let [index, value] of testCases.entries()) {
  findNumOccurances(value, str);
}

edit: assume in the stack initial setup process ill add a check to make sure its equal to the needle length and return zero occurances immediately if not

edit #2: somethings about this algorithm I realized:
Each character in the needle is a loop, the last character in the needle (the top of the stack), every time its set or changes we can add one to total occurances.

each character loop advances once in its letters array for the completion of the entire letters array for the character that follows it and so on for each character in the needle.

admit8490 · July 27, 2022, 7:25pm

Hey
I am starting to solve this problem, have some idea, but will able to tell more when will write some code.
I’ll start like below.
For now , I will try to return all occurances in some form of data structure(it will be object, lets say), to store indexes of each occurance.
I have idea for algorithm, let me know if you interested in progress of this work.
To clarify: I just finished JS section of FCC currculum, I think I have skills to solve it. Main problem here - to invent good algo. As for tools - maybe even I know enough to solve it in JS.

/* TASK
to write a function which will take 2 parameters:
substring: string
sequence: string

function must return all occurances of substring(form of output is for the discussion)

occurance here - not necessarily adjacent characters,
for this task matters order of characters in the string

explanations in pseudocode will be presented, using example of input as following

substring = 'ABC'
sequence = 'QWBFASYRBYFVAREEQCCDJACEEWCCBWRWERAG' (note: maybe more appropriate example of sequence required)
*/

const occurancesOfSubstring = (substring, sequence) => {
    
}

UPDATE.
Still in the beginning of the whole thing, but.
First idea.
Before even starting process of finding occurances, I prefer to get rid of whole sequence. I really need only indexes of symbols. Helper function, explanation and example below. I think it’s reasonable thing to do, if we have test case with a giant ridiculous sequence and there not so many occurances of substring there.

const substring = 'ABC'
const sequence = 'QWBFASYRBYFVAREEQCCDJACEEWCCBWRWERAG'

/*
UNIT 2
lets try to work not with string but
with
every occurance of every char from substring in the sequence

input
substring: string
sequence: string

output: Object

Object structure
key >>> index of character in the sequence
value >>> character in the sequence
*/

const indexesOfSubstringCharacters = (substring, sequence) => {
  let result = {};
  for (let i = 0; i < substring.length; i++) {
    for (let j = 0; j < sequence.length; j++) {
      if (substring[i] === sequence[j]) {
        result[j] = substring[i];
      }
    }
  }
  return result;
}


//console.log(indexesOfSubstringCharacters(substring, sequence))
/*
Output:

{
  '2': 'B',
  '4': 'A',
  '8': 'B',
  '12': 'A',
  '17': 'C',
  '18': 'C',
  '21': 'A',
  '22': 'C',
  '26': 'C',
  '27': 'C',
  '28': 'B',
  '34': 'A'
}
*/

admit8490 · July 28, 2022, 1:04pm

So I did bunch of preliminary work:

wrote bunch of functions to get an array, from which I will try to get needed combos

wrote pseudocode where I described the process of finding and building combos

one more function need to be built.

I think knowledge about how to work with graphs will be useful here. My skills ‘how to work with graphs’ are crappy, I am always stuck with this stuff.

The below is heavy commented code. Though I tried to provide only necessary explanations, I think I was verbose more than I should be(maybe? need feedback about that).

/* TASK
to write a function which will take 2 parameters:
substring: string
sequence: string

function must return all occurances of substring(form of output is for the discussion)

occurance here - not necessarily adjacent characters,
for this task matters order of characters in the string

explanations in pseudocode will be presented, using example of input as following

substring = 'ABC'
sequence = 'QWBFASYRBYFVAREEQCCDJACEEWCCBWRWERAG' (note: maybe more appropriate example of sequence required)
*/


/*Preparation 1
to work with example, function will be used
for visualization, to show indexes of all characters of the sequence
*/

/*
const showIndexesOfChars = (str) => {
    
    return str.split('').reduce(
      (newarray, current, index) => [...newarray, [current, index]], []
      )
    
}

Output:

[
  [ 'Q', 0 ],  [ 'W', 1 ],  [ 'B', 2 ],
  [ 'F', 3 ],  [ 'A', 4 ],  [ 'S', 5 ],
  [ 'Y', 6 ],  [ 'R', 7 ],  [ 'B', 8 ],
  [ 'Y', 9 ],  [ 'F', 10 ], [ 'V', 11 ],
  [ 'A', 12 ], [ 'R', 13 ], [ 'E', 14 ],
  [ 'E', 15 ], [ 'Q', 16 ], [ 'C', 17 ],
  [ 'C', 18 ], [ 'D', 19 ], [ 'J', 20 ],
  [ 'A', 21 ], [ 'C', 22 ], [ 'E', 23 ],
  [ 'E', 24 ], [ 'W', 25 ], [ 'C', 26 ],
  [ 'C', 27 ], [ 'B', 28 ], [ 'W', 29 ],
  [ 'R', 30 ], [ 'W', 31 ], [ 'E', 32 ],
  [ 'R', 33 ], [ 'A', 34 ], [ 'G', 35 ]
]
*/


/*UNIT 1(optional, maybe will be good for optimization, maybe won't)
for this problem we really interested in part of sequence,
starting with first occurance of first symbol of substring:

'QWBFASYRBYFVAREEQCCDJACEEWCCBWRWERAG'

in the above, with sequence abc, i may not need 'QWB' part
I don't wanna work with it

I may slice it, for now, however, it will be commented out and not used
*/

/*
const cutTheStart = (substring, sequence) => {
  const arrSequence = sequence.split('');
  return arrSequence.slice(arrSequence.indexOf(substring[0])).join('')
}
*/

/*
UNIT 2
lets try to work not with string but
with
every occurance of every char from substring in the sequence

input
substring: string
sequence: string

output: Object

Object structure
key >>> index of character in the sequence
value >>> character in the sequence
*/

const indexesOfSubstringCharacters = (substring, sequence) => {
    let result = {};
    for (let i = 0; i < substring.length; i++) {
      for (let j = 0; j < sequence.length; j++) {
        if (substring[i] === sequence[j]) {
          result[j] = substring[i];
        }
      }
    }
    return result;
  }
  
  
  //console.log(indexesOfSubstringCharacters(substring, sequence))
  /*
  Output:
  
  {
    '2': 'B',
    '4': 'A',
    '8': 'B',
    '12': 'A',
    '17': 'C',
    '18': 'C',
    '21': 'A',
    '22': 'C',
    '26': 'C',
    '27': 'C',
    '28': 'B',
    '34': 'A'
  }
  */

/*
UNIT 3
statement
all occurences will be in range between(inclusively):
first occurance of first character of substring
last occurance of last character of substring

so for object above we can cut all props:
before '4': 'A' and after '27': 'C'

input: object, string
output: none, will mutate existing object
when called
*/

const cutTheTails = (obj, substring) => {
    const entriesFromStart = Object.entries(obj);
    let entry = 0;
    
    while (entriesFromStart[entry][1] !== substring[0]) {
      delete obj[entriesFromStart[entry][0]];
      entry++;
    }
    
    const entriesFromEnd = Object.entries(obj).reverse()
    entry = 0;
    while (entriesFromEnd[entry][1] !== substring[substring.length - 1]) {
      delete obj[entriesFromEnd[entry][0]];
      entry++;
    }
    
}

/*if testObj will be >>> as object above

then after executing:

cutTheTails();
console.log(testObj);

Output:

{
  '4': 'A',
  '8': 'B',
  '12': 'A',
  '17': 'C',
  '18': 'C',
  '21': 'A',
  '22': 'C',
  '26': 'C',
  '27': 'C'
}
*/

/*preparation 2 (maybe it will be actual UNIT)
edit: for now it's actual UNIT4
to better understand and/or explain algortihm
of finding occurances

I will use a function which

will take as input object as the below example
(something we have after running previous units)

and will represent occurances in specific order
look at example of output to get the idea

example of input:
const substring = "ABC";
const testObj = {
  '4': 'A',
  '8': 'B',
  '12': 'A',
  '17': 'C',
  '18': 'C',
  '21': 'A',
  '22': 'C',
  '26': 'C',
  '27': 'C'
};
*/

const sortBySubstringStructure = (obj, substring) => {
  const keys = Object.keys(obj);
  let resultArray = []
  for (let i = 0; i < substring.length; i++) {
    resultArray.push([]);
    for (let key of keys) {
      if (obj[key] === substring[i]) {
        resultArray[i].push([substring[i], key])
      }
    }
  }
  return resultArray;
}



/*
console.log(sortBySubstringStructure(testObj, substring));
Output:

[
  [ [ 'A', '4' ], [ 'A', '12' ], [ 'A', '21' ] ],
  [ [ 'B', '8' ] ],
  [
    [ 'C', '17' ],
    [ 'C', '18' ],
    [ 'C', '22' ],
    [ 'C', '26' ],
    [ 'C', '27' ]
  ]
]

*/

/*PSEUDOCODE for finding actual result
  as an example will be used 3D array which is above

To build the occurances we need, we should:

build the connections between subarrays of the example
step by step
after each step we should reduce the example

#first iteration step
need to build the connection between occurances of substring[0] and substring[1]
we can say for example: between A's and B's
 [ [ 'A', '4' ], [ 'A', '12' ], [ 'A', '21' ] ]
  [ [ 'B', '8' ] ]

  !!!IMPORTANT
  we need connections ONLY IF
  index of B bigger than index of A
  [ 'A', '4' ][ 'B', '8' ] >>> we need that stuff
  [ 'A', '12' ][ 'B', '8' ] >>> this is garbage, don't need that

At the end of first iteration we must have this:
 [
  [ [ 'A', '4' ], [ 'B', '8' ] ],
    [
    [ 'C', '17' ],
    [ 'C', '18' ],
    [ 'C', '22' ],
    [ 'C', '26' ],
    [ 'C', '27' ]
  ]
To generalize: now instead of two sub arrays for A and B respectively,
we have one subarray for all 'AB'-s

#second iteration step
using same logic: build connection between AB-s and C-s
or we can say between substring[0]+substring[1] and substring[2]

substring[2] is the last char of substring, so the whole process will stop after this step
As a result we should have the below. It appears in our example all C-s are fit into our AB combo.
[
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '17' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '18' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '22' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '26' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '27' ]]
]

*/

const occurancesOfSubstring = (substring, sequence) => {

  //process string into object using UNIT2 and UNIT 3
  let relevantCharacters = indexesOfSubstringCharacters(substring, sequence);
  cutTheTails(relevantCharacters, substring);

  //represent object as 3D array using UNIT4
  let arrayOfSortedCharacters = sortBySubstringStructure(relevantCharacters, substring);

  //TO DO
  //implement the last part of pseudocode
  //to get the result from array
  //consider research about
  //graph theor. models
  //combinatorics
  //statistics(?not sure about this one)
  
}

MAJOR UPDATE
first working function to generate all occurances
more testing needed tho

const allOccurances = (arr, str) => {
  let potentialResultCombos = [];
  for (let char of arr[0]) {
    potentialResultCombos.push([char]);
  }
  for (let arrIndex = 1; arrIndex < arr.length; arrIndex++) {
    newGen = [];
    for (char of arr[arrIndex]) {
      for (combo of potentialResultCombos) {
        //console.log(combo ,char, combo[combo.length - 1])
        if (Number(combo[combo.length - 1][1]) < Number(char[1])) {
          //console.log([...combo, char])
          newGen.push([...combo, char])
        }
      }
    }
    potentialResultCombos = newGen;
  }
  return potentialResultCombos;
}
  
/*UNIT TEST
const substring = "ABC"
const testArray = [
  [ [ 'A', '4' ], [ 'A', '12' ], [ 'A', '21' ] ],
  [ [ 'B', '8' ] ],
  [
    [ 'C', '17' ],
    [ 'C', '18' ],
    [ 'C', '22' ],
    [ 'C', '26' ],
    [ 'C', '27' ]
  ]
];
console.log(allOccurances(testArray, substring));

Output:

[
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '17' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '18' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '22' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '26' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '27' ] ]
]
*/

Whole solution, almost no comments here, just clean code:

//UNIT2
const indexesOfSubstringCharacters = (substring, sequence) => {
    let result = {};
    for (let i = 0; i < substring.length; i++) {
      for (let j = 0; j < sequence.length; j++) {
        if (substring[i] === sequence[j]) {
          result[j] = substring[i];
        }
      }
    }
    return result;
}

//UNIT3
const cutTheTails = (obj, substring) => {
    const entriesFromStart = Object.entries(obj);
    let entry = 0;
    
    while (entriesFromStart[entry][1] !== substring[0]) {
      delete obj[entriesFromStart[entry][0]];
      entry++;
    }
    
    const entriesFromEnd = Object.entries(obj).reverse()
    entry = 0;
    while (entriesFromEnd[entry][1] !== substring[substring.length - 1]) {
      delete obj[entriesFromEnd[entry][0]];
      entry++;
    }
    
}

//UNIT4
const sortBySubstringStructure = (obj, substring) => {
  const keys = Object.keys(obj);
  let resultArray = []
  for (let i = 0; i < substring.length; i++) {
    resultArray.push([]);
    for (let key of keys) {
      if (obj[key] === substring[i]) {
        resultArray[i].push([substring[i], key])
      }
    }
  }
  return resultArray;
}

//UNIT5
const allOccurances = (arr, str) => {
  let potentialResultCombos = [];
  for (let char of arr[0]) {
    potentialResultCombos.push([char]);
  }
  for (let arrIndex = 1; arrIndex < arr.length; arrIndex++) {
    newGen = [];
    for (char of arr[arrIndex]) {
      for (combo of potentialResultCombos) {
        //console.log(combo ,char, combo[combo.length - 1])
        if (Number(combo[combo.length - 1][1]) < Number(char[1])) {
          //console.log([...combo, char])
          newGen.push([...combo, char])
        }
      }
    }
    potentialResultCombos = newGen;
  }
  return potentialResultCombos;
}

//MAIN UNIT
const occurancesOfSubstring = (substring, sequence) => {

  //process string into object using UNIT2 and UNIT 3
  let relevantCharacters = indexesOfSubstringCharacters(substring, sequence);
  cutTheTails(relevantCharacters, substring);

  //represent object as 3D array using UNIT4
  let arrayOfSortedCharacters = sortBySubstringStructure(relevantCharacters, substring);
  
  //get result using UNIT5
  return allOccurances(arrayOfSortedCharacters, substring);
  
}

const testSubstring1 = 'ABC';
const testSequence1 = 'QWBFASYRBYFVAREEQCCDJACEEWCCBWRWERAG';

console.log(occurancesOfSubstring(testSubstring1, testSequence1));

/*
Output:

[
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '17' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '18' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '22' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '26' ] ],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '27' ] ]
]
*/

kravmaguy · July 29, 2022, 8:20pm

sorry I didnt have chance to go through your messages/your code. they were extensive and you were editing them.

I have to find a way to for each character in hashMap[str[idx]] it must complete a loop in hashMap[str[idx+1]] etc… until the end of str.

I can generate all combinations in a 3Digit padlock:

for (let i = 0; i < 10; i++) {
  for (let j = 0; j < 10; j++) {
    for (let k = 0; k < 10; k++) {
      // 1000
      console.log([i, j, k]);
    }
  }
}

in order to add them it must be greater than its previous. hashMap[str[idx+1]] > hashMap[str[idx]] we can do that by maintaining i>j>k

The problem with the above is I know the amount of loops I need and dont have to generate that dynamically.

I researched work others already did:

const padLock = (prefix, node) => {
  if (!node.next) {
    node.digits.map((n) => {
      console.log({ prefix });
    });
  } else {
    node.digits.map((n) => {
      padLock(prefix + n, node.next);
    });
  }
};

var digit1 = { digits: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], next: null };
var digit2 = { digits: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], next: null };
var digit3 = { digits: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], next: null };

digit1.next = digit2;
digit2.next = digit3;

though Id like to create my own iterative process. I didnt read/test your solutions but you can check if it works against the sequences file, use this utility I made

I am thinking forEach letter in hashMap[str[0]] we need to finish its remaining characters. continuing downward with the same process. and as long as its occurance is not complete try to add next character in needle at hashMap[str[idx+1]]
re:

needle: "abc"
haystack: "abcaabca"
//occurances: [[a, , ], [a, , ], etc..

That is the opposite of the recursive basecase where where it is the last letter and has no next

admit8490 · July 29, 2022, 8:30pm

That’s similar to my idea. I am not using hashmap, but I believe my algo is the same.

My idea how to find combos is below. I copied relevant part of my message. It’s pseudocode, I also have an implementation of this.
btw thanks for sharing it, this is definitely good practice!
Input for this will be 3D array like this, its indexes of every relevant character of needle in the haystack:

From this I can build all combos.

To build the occurances we need, we should:

build the connections between subarrays of the example
step by step
after each step we should reduce the example

#first iteration step
need to build the connection between occurances of substring[0] and substring[1]
we can say for example: between A's and B's
 [ [ 'A', '4' ], [ 'A', '12' ], [ 'A', '21' ] ]
  [ [ 'B', '8' ] ]

  !!!IMPORTANT
  we need connections ONLY IF
  index of B bigger than index of A
  [ 'A', '4' ][ 'B', '8' ] >>> we need that stuff
  [ 'A', '12' ][ 'B', '8' ] >>> this is garbage, don't need that

At the end of first iteration we must have this:
 [
  [ [ 'A', '4' ], [ 'B', '8' ] ],
    [
    [ 'C', '17' ],
    [ 'C', '18' ],
    [ 'C', '22' ],
    [ 'C', '26' ],
    [ 'C', '27' ]
  ]
To generalize: now instead of two sub arrays for A and B respectively,
we have one subarray for all 'AB'-s

#second iteration step
using same logic: build connection between AB-s and C-s
or we can say between substring[0]+substring[1] and substring[2]

substring[2] is the last char of substring, so the whole process will stop after this step
As a result we should have the below. It appears in our example all C-s are fit into our AB combo.
[
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '17' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '18' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '22' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '26' ]],
  [ [ 'A', '4' ], [ 'B', '8' ], [ 'C', '27' ]]
]

*/

My whole solution works with small sequences, I will test it on some large ones.

admit8490 · July 29, 2022, 8:55pm

I need clarification about test case

For all sequences in this file: file from your repo

I should use one needle? The one below?

"join the nmi team"

Ok, for 5 smaller cases and the above needle I have this:

[
  'Haystack 1 from your file has 1 occurances',
  'Haystack 2 from your file has 256 occurances',
  'Haystack 3 from your file has 0 occurances',
  'Haystack 4 from your file has 336 occurances',
  'Haystack 5 from your file has 9 occurances'
]

And for the giant last one it’s freezing. So I guess both your and mine logic is ok, but we need to come up with better implementation.

I am wondering about this:
do we actually need to build combos, if we need to only know the number of them?
maybe there is some formula in maths, which will allow to get number of combos, if we know indexes of all chars from needle in the sequence.

I am not sure which maths is this. It must be something which is dealing with sets, subsets, this kind of stuff.

I tnink there is a strong reason behind the task itself? Why they are asking just number of occurances?

kravmaguy · August 1, 2022, 7:05pm

no

the count of all the combos is the count of all the first characters indeces which needs to have all the second characters indeces that come after it, which needs all the third characters indeces after it etc… until the length of the needle.

re if a needle is “ab” I need to know for each occurance of ‘a’ how many 'b’s appeared after it.

admit8490:

Ok, for 5 smaller cases and the above needle I have this:

[
  'Haystack 1 from your file has 1 occurances',
  'Haystack 2 from your file has 256 occurances',
  'Haystack 3 from your file has 0 occurances',
  'Haystack 4 from your file has 336 occurances',
  'Haystack 5 from your file has 9 occurances'
]

I didnt say your code is wrong if you getting the right answers only I didnt yet take the time to go over and try to understnd it. you wrote alot of code.

true. and its right one can use a que or a stack it doesnt matter. you can get the same results. maybe this line can be improved in the orginal implementation:

it goes through every single one beneath it doing that check. it doesnt need to do all that if arr[i] is going to be in order. maybe remove all values in arr[i] that are less than val or/and use binary search to find where to start

I dont know. need more time to think about this.

admit8490 · August 1, 2022, 7:15pm

I am planning to refactor my code for this problem in the near future. I have couple ideas.
I suggest: not to spent time on reviewing current version of my solution.

I will notify, when I will complete refactoring.

The code from your original post - did you make any refactoring after posting it?

kravmaguy · August 1, 2022, 8:51pm

No, I only suggested what could be done about maybe fixing that, re maybe binary below here to find where to start instead of starting from the beginning and checking each one, but I didnt try to implement it:

I wasted time answering my earlier question how to write an iterative function to generate all combinations of a padlock of N length so I went from this where amount of places is hard coded:

for (let i = 0; i < 10; i++) {
  for (let j = 0; j < 10; j++) {
    for (let k = 0; k < 10; k++) {
      // 1000
      console.log([i, j, k]);
    }
  }
}

to this, so for this padlock of length 5 you can pass in 5 to generate all possible combos (different from our case where we only want valid combos)

function genPadLock(keyLength, obj = {}) {
  const array = [...Array(10).keys()];
  const hashMap = Array(keyLength)
    .fill(0)
    .reduce((acc, _, currIdx) => {
      acc[currIdx] = array;
      return acc;
    }, obj);
  const queue = [];
  const paths = [];
  queue.push(...hashMap[0].map((val) => [val, 0, `${val}`]));
  while (queue.length > 0) {
    const [, idx, path] = queue.shift();
    if (idx === Object.keys(hashMap).length - 1) {
      paths.push(path);
    } else {
      const arr = hashMap[idx + 1];
      for (let i = 0; i < arr.length; i++) {
        queue.push([arr[i], idx + 1, `${path} -> ${arr[i]}`]);
      }
    }
  }
  return paths;
}

I can refactor it later. anyways the above doesnt help anything. you said we dont need to generate any combos.

//needle= "ABC"
//HayStack xxAAxBCC

x
xx
xxA //1st letter found
xxAA //2 occurances of 1st letter
xxAAx
xxAAxB // 2nd letter found 2 occurances of 1st and 2nd
xxAAxBC // final letter found count+2
xxAAxBCC // final letter found count+2

How to do it?

admit8490 · August 1, 2022, 9:00pm

I suggest for the start represent this haystack:

in the form of below object:

{
A: [2, 3],
B: [5],
C: [6, 7]
}

I hope you get the idea of the object structure. It could be done by looping through haystack.

And from that I think we can move on to the process of counting combos. I was able to generate combos from structure like this, but create combos dynamically - for the last case it’ too heavy. that’t why its freezing I guess.

But for now I am struggling to rewrite my code in order to count them, not build them

kravmaguy · August 1, 2022, 9:18pm

yeah thats what I did originally

admit8490 · August 1, 2022, 9:24pm

I need to make a correction.
I think object is not good here
If we have needle like

ABCA

object will be not good, because keys are unique, and we need array of indices for each of A from needle. Sorry for that, shpould use array here

I also suggest a little more complex haystack for the discussion

like

BxxAxCxBxxCxBxxAxCCxxA

We will need to handle cases when some C occurs before some B, so I think such haystack or something similar would be more appropriate.
What do you think?

admit8490 · August 1, 2022, 9:51pm

Suggestion
for now use your first example

needle = 'ABC'
haystack = 'xxAAxBCC'

Instead of this array

[
 [2, 3], //line for A
 [5], //line for B
 [6, 7] //line for C
]

I suggest to add weights to each occurance like that(it’s general idea, maybe weights for the start should be different)

[
 [[2, 1],  [3, 1]], //line for A
 [[5, 0]], //line for B
 [[6, 0], [7, 0]] //line for C
]

So, when looping through this structure we will not build combos, we will be adding weights. I need some time to express this thought more extensively.

After comparing all B-s and all A-s we will have this:

[
 [[2, 1],  [3, 1]], //line for A
 [[5, 2]], //line for B
 [[6, 0], [7, 0]] //line for C
]

After comparing all B-s and all C-s we will have

[
 [[2, 1],  [3, 1]], //line for A
 [[5, 2]], //line for B
 [[6, 2], [7, 2]] //line for C
]

Iterations finished
And now we need to sum all weights from last line(C) >> 2 + 2 ===4.

For this haystack answer is 4 I guess?

DanCouper · August 2, 2022, 12:31am

I think this is a dynamic programming question, but I know very little about methods of dynamic programming so

Because it’s just a count, it means it should be easy-ish to solve without much complexity, ie in a test situation. I assume there was some very specific real-life problem where this was actually useful, but otherwise it looks like a highly contrived leetcode thing.

Anyway @kravmaguy, I haven’t written anything here, and not optimised (which is where the dynamic programming part comes in and :shrug:), but I think you’re overcomplicating this drastically. Following should work afaics:

I want to iterate over both the sequence of characters (haystack) and the string (needle) at the same time.
Therefore I need two pointers (hPtr and nPtr) to the characters I’m currently looking at.
I would check right to left, because it means I can exit when pointers reach 0, which means less of “length - 1” and opportunities for easy off-by-one errors.
Therefore I would initialise the two pointers to the length of their respective strings.
These would be parameters of a function.
Then in the body:
IF both pointers are 0, there’s symmetry, got a match, return 1.
OTHERWISE if hPtr is 0, no symmetry, no match, return 0
OTHERWISE if the value at hPtr - 1 of haystack matches the value at nPtr - 1 of needle, that’s a match and need to move through the sequence. Can do that by getting the result of running the function again with those two pointers decremented and adding it to the result of running the function again with just the haystack pointer decremented. This is the bit that does the actual work, and also the thing that cause it to expand exponentially.
OTHERWISE just return the result of the function with only the haystack pointer decremented – not a match so what we want to do is move to the next character in the haystack.

The last four digits thing is actually important in this, because means can have an extremely compact solution (like four lines, basically) but bail when the count hits 9999 – I would just add another parameter to the function and initialise it as 0, that’s the count, return when it hits max number. The program will expand exponentially, so it’ll get v slow.

The dynamic programming part is what I assume will make it reasonably quick and would allow higher numbers in reasonable time. Dynamic programming hurts my head every time i read about it for some reason, and even working primarily in a language where recursion is common I don’t think I’ve ever seen it explicitly used.

Above is recursive, but can be converted to use a stack.

Edit: this feels like something that you memorise the solution from a book on algorithms. It has close to zero practical use afaics, and I would be extremely annoyed if I was asked it – the only reason I came up with above is because I did something kinda similar when I was doing some code problems for practice (had two arrays, and had to iterate over one and kinda peek forwards to check the other one). I assume I looked up the answer at the time. if this is for what I assume is a junior position, and you failed the interview just because of this question then that is bizarre and brutally unfair

Edit: yeah, this works. For large strings, , but:

function recCount(haystack, needle, hPtr = haystack.length, nPtr = needle.length) {
  switch (true) {
    case (hPtr === 0 && nPtr === 0): return 1;
    case (hPtr === 0): return 0;
    case (haystack[hPtr - 1] === needle[nPtr - 1]): return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr);
    default: return recCount(haystack, needle, hPtr - 1, nPtr);
  }
}

admit8490 · August 2, 2022, 1:47am

I guess I solved it.
I can’t find how to check the answer for the last giant sequence.
I have this ridiculous number as a result:

Output:

5.727825237140057e+43

for 5 smaller test cases it will be:

Output:

Haystack:   oinj oin the nmi team
result:  1
expected:  1
-------------
Haystack:   jjooiinn  tthhe nmi abcd team
result:  256
expected:  256
-------------
Haystack:   join the nmiteam
result:  0
expected:  0
-------------
Haystack:   jjjjoiin dse thehsethe namanmie team
result:  336
expected:  336
-------------
Haystack:   teamjoin tehsthe nmisa   team
result:  9
expected:  9
-------------

Whole solution(will blur it in case you will consider it a spoiler.

const representIndices = (needle, haystack) => {
  let arrayOfIndices = []
  for (let i = 0; i < needle.length; i++) {
    arrayOfIndices.push([]);
    for (let j = 0; j < haystack.length; j++) {
      if (needle[i] === haystack[j]) {
        arrayOfIndices[i].push(j)
      }
    }
  }
  return arrayOfIndices;
}

const addWeights = (twoDarray) => {
  twoDarray[0] = twoDarray[0].map(index => [index, 1]);
  for (let i = 1; i < twoDarray.length; i++) {
    twoDarray[i] = twoDarray[i].map(index => [index, 0]);
  }
}

const countCombos = (threeDArray) => {
  for (let i = 1; i < threeDArray.length; i++) {
    //console.log(threeDArray[i], threeDArray[i - 1]);
    for (nextChar of threeDArray[i]) {
      for (prevoiusChar of threeDArray[i - 1]) {
        //console.log(nextChar, prevoiusChar);
        if (nextChar[0] > prevoiusChar[0]) {
          nextChar[1] += prevoiusChar[1]
        }
      }
    }
  }
  return threeDArray[threeDArray.length - 1].reduce(
    (combos, lastCharWeight) => combos + lastCharWeight[1], 0
    )
}


const occurancesOfSubstring = (needle, haystack) => {
  arrayOfOccurances = representIndices(needle, haystack);
  //console.log(arrayOfOccurances);
  addWeights(arrayOfOccurances);
  //console.log(arrayOfOccurances);
  return countCombos(arrayOfOccurances);
}

camperextraordinaire · August 2, 2022, 1:49am

FYI - Find number of times a string occurs as a subsequence in given string - GeeksforGeeks

It can be solved in multiple ways (dynamic programming being one of them).

admit8490 · August 2, 2022, 2:04am

I just tested your solution.
For small test cases (5 of them I wrote above) it’s ok.

However, for ridiculous giant string.

RangeError: Maximum call stack size exceeded

I tested it in online compiler tho, maybe it’s because of it. I don’t know much about specifics of different environments.

admit8490 · August 2, 2022, 2:46am

Well, for Python version I have this for giant string. I pasted part of it, error message is actually larger.
Whole message ends with this:

  File "main.py", line 9, in recCount
    if haystack[hPtr - 1] == needle[nPtr - 1]:
IndexError: string index out of range

Start of error message:

Output:

Traceback (most recent call last):
  File "main.py", line 19, in <module>
    print(recCount(giantSequence, needle))
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  [Previous line repeated 16 more times]
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  [Previous line repeated 120 more times]
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
  [Previous line repeated 1 more time]
  File "main.py", line 10, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr - 1) + recCount(haystack, needle, hPtr - 1, nPtr)
  File "main.py", line 11, in recCount
    return recCount(haystack, needle, hPtr - 1, nPtr)
..................

admit8490 · August 2, 2022, 2:50am

Needle is the same:

'join the nmi team'

Giant string can be found in this file in repo of the topicstarter
It starts with line 6 of the file.
I won’t copy it here for the sake of readability

admit8490 · August 2, 2022, 3:15am

Yeah, I understand. That’s why I decided to test it, I was curious: maybe recursive solution didn’t work because of JS. But apparently Python version doesn’t handle large string also.

I am wondering: is there any trick to modify recursive solution in order to make it work for large haystacks?

Topic		Replies	Views
Anybody review my word frequency checker in string logic is there any better way to implement this? JavaScript	8	222	June 1, 2021
Number of occurrences in a string?	7	1882	January 16, 2021
Solution feedback JavaScript	1	199	November 23, 2022
Sorting and RegExp difficulty	21	1229	June 1, 2021
No Repeats Please - doubt before starting JavaScript	2	441	February 2, 2021

Real job interview question, find all occurances of a string

consider inputs

Related topics