I don’t understand the connection you have between the link/category and image. There also isn’t a one-to-one relationship when it comes to the numbers of elements. There are 100 images per page (per category) and 39 categories in total.
If the second map (link/category) ran as many times as the first map (images) then you would just get the image from the images array inside the second map using its index.
const image = images[i];
return {
link,
cat,
image
};
But this will only give you 39 images.
Can you show what you want the CSV to look like?
I did a search for the image names in another thread and found the site. Not sure about linking it here? It is a online store.
@makamo66 We merged two of your threads to give some more context. Please continue your current questions in this thread. Sorry for any confusion.
You merged together two different code bases. The first one console logs all the output and the second one saves to a CSV file. It seems more confusing now than it was before.
@makamo66 Well, in any case, if you have questions post them here. You also never answer the question I posted.
I don’t know what I want the CSV to look like. That’s why I didn’t answer.
Well, then I can’t really help you transform the data. But you see why I asked that question, right? You have many more images then links/categories.
const images = cheerio('td.productListing-data > a > img', html)
.map((i, image) => cheerio(image).attr('src'))
const img = images.each( (i, image) =>
{image}
).get()
return {
img,
}
The code above resulted in the entire array being outputted to each cell of the spreadsheet.
I would expect the spreadsheet to have many more cells filled with images than with links/categories.
Even when I changed the names of the parameters, it still didn’t work.
So I tried to learn how to do this on my own by taking two video courses about node and cheerio on udemy.com but I haven’t gotten any better at it. The code produces an array of image URLs when I do this:
const request = require("request-promise");
const cheerio = require("cheerio");
const url = "https://www.example.com";
const scrapeResults = [];
async function scrapeJobHeader() {
try {
const htmlResult = await request.get(url);
const $ = await cheerio.load(htmlResult);
$("td.productListing-data > a ").each((index, element) => {
const resultTitle = $(element).children("img");
const img_url = resultTitle.attr("src");
const scrapeResult = { img_url };
scrapeResults.push(scrapeResult);
});
return scrapeResults;
} catch (err) {
console.error(err);
}
}
async function scrapeWebsite() {
const jobsWithHeaders = await scrapeJobHeader();
console.log(jobsWithHeaders);
}
scrapeWebsite();
but I get the error “TypeError: Cannot read property ‘replace’ of undefined” when I do this:
const img_url = resultTitle.attr("src").replace("images\\/more_color.png","");
The images that are returned include “images/more_color.png” but I just want to return the actual product images and not the pngs.
The results look like this:
{ img_url: 'images/more_color.png' },
{ img_url: undefined },
{ img_url: undefined },
{
img_url: 'images/20191206/thumb/AK1501-@RH-CRY-LOVE@22X06-825_3L@467400@200@01@200.jpg'
},
{ img_url: 'images/more_color.png' },
{ img_url: undefined },
{ img_url: undefined },
{
img_url: 'images/20191206/thumb/AK1501-@GD-CRY-LOVE@22X06-825_3L@467399@200@01@200.jpg'
},
{ img_url: 'images/more_color.png' },
{ img_url: undefined },
{ img_url: undefined },
{
img_url: 'images/20191206/thumb/AK1500-@GD-CRY-QUEEN@3X06-825_3L@467397@200@01@200.jpg'
},
{ img_url: 'images/more_color.png' },
{ img_url: undefined },
{ img_url: undefined },
The next step would be to get rid of the undefined image URLs but I haven’t gotten that far yet.
I got rid of the undefined ones by using:
if (img_url != undefined){
scrapeResults.push(scrapeResult);
}
You can check the Truthiness of the img string and see if it does not end with .png before pushing to the array.
async function scrapeJobHeader() {
try {
const htmlResult = await request.get(baseURL);
const $ = await cheerio.load(htmlResult);
$('td.productListing-data > a ').each((index, element) => {
const resultTitle = $(element).children('img');
if (
resultTitle.attr('src') &&
!resultTitle.attr('src').endsWith('.png')
) {
let img_url = resultTitle.attr('src');
scrapeResults.push({ img_url });
}
});
return scrapeResults;
} catch (err) {
console.error(err);
}
}
Thank you lasjorg! That worked!
No problem, glad to help. I should have linked to the endsWith method docs.
I suggest you keep these three links handy, on the left-hand side is the list of all the methods on the different objects (Object, Array, String).
Happy coding!