Nested For Loops Not Working

I have a very complicated and convoluted code so I can’t publish it. I am going to try to explain it in words (pseudo-code). I’m not sure anyone will be able to help me because it’s very complicated.

The way the function is executed in loops as follows causes the browser to visit the /page=1 twice
When I un-nest the m for loop it goes to a second page for an url that doesn’t have a second page

const urls=["https://www.example.com/jewelry/stainless-steel","https://www.example.com/jewelry/watch"
];

async function findTotalPages(url) {
//This function scrapes the page to find the number of products being listed and divides by 100 and uses ceil() to find the page number. There are 100 products per page. 
return NumberOfPages;
}
async function scrapeProductPage(url){
//This function finds out where the product page is and takes you there so that you can scrape it
return scrapeResults;
}
async function scrapeDescription(url, page) {
//This function scrapes the product page and returns the data that was scraped.
return {url, cats, main_img, name, descript, price};
}
async function scrapeAll() {

//The first loop finds the number of pages to append to the category URL for the pagination

for (let m = 0; m < urls.length; m++ ){
pages = await findTotalPages(urls[m]);
//} when I un-nest the for loop it goes to a second page for an url that doesn't have a second page because the last url in the array has 2 pages

const descriptionPage = await browser.newPage();

//The second loop provides the URL array for the next loop
for (let k = 0; k < urls.length; k++ ){

The third loop uses the page number to go to the scrapeProductPage function
for (let j = 1; j <= pages; j++ ){
scrapeResults = await scrapeProductPage(urls[k] + "/page=" + j );

//The fourth loop scrapes the description in the product page and outputs the scraped data as an array
for (let i = 0; i < scrapeResults.length; i++){
result = await scrapeDescription(scrapeResults[i], descriptionPage);
resultsArray = [...resultsArray,result];

}}}}

1.) For the first iteration, findTotalPages finds the total number of pages for https://www.example.com/jewelry/stainless-steel which for this case is 1
2.) The second and third loop go to https://www.example.com/jewelry/stainless-steel/page=1
3.) The fourth loop scrapes the description in the product page which looks something like this:
https://www.example.com/product_info.php?products_id=430373

Because they’re nested, each loop will start at the first index every time the loop starts. In this case, the first loop, m, runs twice, the k loop four times and j loops by the number of pages multiplied by 4. i will loop three times as much. Your application is repeating the same task multiple times.

I’m surprised you’re only getting duplicates.

function scrapeAll() {
	console.log('doing')
  for (let m = 0; m < 2; m++ ) {
    const pages = 2;
    console.log("M"+m)
    for (let k = 0; k < 2; k++ ) {
			console.log("K"+k)
      for (let j = 1; j <= pages; j++ ) {
        console.log("J"+j)
        for (let i = 0; i < 3; i++){
          console.log("I"+i)
        }
      }
    }
  }
}

scrapeAll()

Take all the nested loops, separate them out into individual loops and just pass data between them. It’s easier to read and easier to debug.

Thanks JanShah

Do I push to arrays in order to pass data between the individual loops?

I think so. You probably don’t need the k loop either, you’re already going through the urls with m. In that event, you could probably get away with one nested loop for the urls and pages. Pass those into an array and move the i loop into a new function to handle the array

pages is an array so it should have an index - pages[m]

I was able to solve the problem. The solution was as follows:

async function scrapeAll() {

for (let k = 0; k < urls.length; k++ ){
pages = await findTotalPages(urls[k]);

const descriptionPage = await browser.newPage();

for (let j = 1; j <= pages; j++ ){
scrapeResults = await scrapeProductPage(urls[k] + "/page=" + j );

for (let i = 0; i < 1; i++){

result = await scrapeDescription(scrapeResults[i], descriptionPage);
resultsArray = [...resultsArray,result];

}}}}