Trouble with promises in scraping

Hello people! I have this little snippet of code (which I’m gonna leave below), which however is not working. I am building a scraper, and for readability I am splitting it into functions. However, the second step (in the last two lines) keeps returning an empty array, and I don’t understand why since the await should force the axios request to pull through before returning its value. What it should return instead is an array of links.
Can someone please take a look and help me figure out what is wrong?

const path = require("path");
const fs = require("fs");
const axios = require("axios");
const cheerio = require("cheerio");

const createDirectory = (name) => {
  const dirName = path.join(__dirname, name);
  if (fs.existsSync(dirName)) return;
  fs.mkdirSync(dirName);
};

const getCategories = async (cityUrl) => {
  const baseUrl = "https://graffiti-database.com";
  const cityCategories = [];
  await axios
    .get(cityUrl + "/categories")
    .then((res) => {
      const $ = cheerio.load(res.data);
      $("p.image-info > a").each(function () {
        let i = $(this).attr("href");
        cityCategories.push(baseUrl + i);
      });
    })
    .catch((err) => console.error(">> Error retrieving categories: ", err));
  return cityCategories;
};

const getPagesForCategories = (categories) => {
  const allPages = [];
  categories.forEach(async (category) => {
    await axios.get(category).then((res) => {
      const $ = cheerio.load(res.data);
      const lastPage = $("li.page-item")
        .text()
        .split("\t\t\t\t\t\t\t")
        .slice(-3, -2)[0]
        .replace(/[\t\n]/g, "");
      //   allPages.push(`${category}?page=${lastPage}`);
      //   console.log(allPages);
      for (page = 1; page <= lastPage; page++) {
        allPages.push(`${category}?page=${page}`);
      }
      //console.log(allPages);
    });
  });
  return allPages;
};

getCategories("https://graffiti-database.com/Italy/Milan")
  .then((categories) => getPagesForCategories(categories))
  .then((pages) => console.log(pages));

You aren’t quite using await properly. You definitely don’t use then blocks with it.

Rewriting promise code with async/await

Okay, so that’s the part that is not working? I mean the function definitions should actually work as intended, by each returning a promise and an array of stuff on fulfilled, right?

Sorry, I goofed a little on my original response. The getCategories function is fine, it is working as expected and returning an array of categories.

It’s the getPagesForCategories function that is causing you problems. First, this function needs to be set to async (just like you did for getCategories). Second, you can’t use a forEach loop in this instance. I had to convert it to a standard for loop in order to get it to work properly. I think because the forEach is just going through each item in the array and invoking the callback function but doesn’t actually wait for the call to axios.get to finish before moving on to the next item in the categories array. In other words, the await is having no effect on the forEach. Thus, the function gets to the return statement before any of the axios calls have had a chance to complete and that is why it is returning an empty array. When you convert this to a standard for loop then the getPagesForCategories function itself waits for each call to axios.get to finish and then the return statement will not be reached until all of them have finished.

And I still think you should convert this code to get rid of the then blocks. That’s the primary reason for using async/await, to be able to write your code more clearly in a traditional procedural manner. You aren’t taking full advantage of what it offers.

Yes, I am fixing all the code to comply, I just wrote it in a couple of minutes to get the idea. Now I’m cleaning it.

Yup, this works. Marking it as a solution, thank you very much!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.