Code for the following question available at
I’m trying to create a web scraper with node, request and cheerio.
The scraper works in two phases (so far):
Phase One
The first function scrapeCommitteesPage()
visits a page, scrapes all links, adds them to an array, and passes the array to a second function called getUniqueIDs()
const request = require('request');
const cheerio = require('cheerio');
//uk parliament - select committees page
const committeesListUrl = '';
//get all links to committee pages
request(committeesListUrl, (error, response, body)=>{
if(!error && response.statusCode == 200){
//pass response body to cheerio
const $ = cheerio.load(body),
//get all committee page links - ul.square-bullets-a-to-z li a
committeeLinksArray = [],
linkList = $('.square-bullets-a-to-z a'),
parliamentUrl = '';
//push all links to array
for(let i=0;i<linkList.length;i++){
//pass array to uniqueIDs function
}//end 1st request else
})//end request
Phase Two
The second function getUniqueIDs()
loops through the array of links, passed as an argument from the first function, and saves information (name, id, url) into a committeeDetails
//get unique IDs
for(let i=0;i<committeeLinksArray.length;i++){
request(committeeLinksArray[i], (error, response, body)=>{
if(!error && response.statusCode == 200){
//pass response body to cheerio
const $ = cheerio.load(body),
committeeDetails = [],
//save committee name, id and url for rss feed
committeeName = $("meta[property='og:title']").attr("content"),
uniqueID = $("meta[name='search:cmsPageInstanceId']").attr("content"),
committeeRSSUrl = `${uniqueID}&type=Committee_Detail_Mixed`;
//push details object to committeeDetails array
'committee-name': committeeName,
'committee-ID': uniqueID,
'committee-RSS-URL': committeeRSSUrl
//pass details to visitCommitteePage function
}//end for loop
When I’m trying to console.log
the object just to make sure all the information is there I get single arrays with the object properties but what i need is a single array of objects that will later be manipulated in a third function visitCommitteePage()
So the output I get looks like:
[ { 'committee-name': 'Defence Sub-Committee',
'committee-ID': '105517',
'' } ]
[ { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
'committee-ID': '115803',
'' } ]
whereas what I want is an array of objects that would look like this:
[ { 'committee-name': 'Defence Sub-Committee',
'committee-ID': '105517',
'' },
{ 'committee-name': 'Business, Energy and Industrial Strategy Committee',
'committee-ID': '115803',
'' }
How can I achieve that?
Apologies for the long question.
Code available at