How to scrape text from a webpage using title tags or a combination of id + classes?

How to scrape text from a webpage using title tags or a combination of id + classes?
0

#1

I am working on the “Random Quote Machine” that uses an API to pull random quotes using AJAX and JSON. I wanted to try and go above and beyond that by scraping all the quotes from this site: https://www.brainyquote.com/quotes/keywords/deep.html

I noticed while looking through the console that all the quotes are in separate divs with either titles of “view quote” or “view author” as well as classes with the format “b-qt qt_xxxxxx” where xxxxxx = random 6 digits.

Now I haven’t had much experience using GET requests or any back-end at all but I figure this could be a good stepping stone in understanding how to get data from a website and then parse it so I can use it for my gain.

Can anyone offer any insight in how I can achieve pulling the quote text and the author of at least 100 quotes if not more? What kind of problems could I encounter?

Thanks so much!


#2

let bricks = document.getElementsByClassName("m-brick");
Will get you all the classes named “m-brick”, as far as i read the page these seem to be used exclusively for quotes.

The data you get from bricks will be an array with each dom element using m-brick, so you need to go through the array to get the individual quotes.

For that you can use for (let brick of bricks), inside this loop you will have access to each item in the array, and to access the children and go through the DOM you can just use brick.children, after each .children call, you will receive an array of all the children, or a single element if your class only has one children.

Some quotes have images above them, others don’t, to access the quote in a node with images you would need to use brick.children[1], this will get you the <a> tag representing the quote block, to finally access the text of the code, you can use brick.children[1].innerHTML.

If the children node doesn’t have an image tag in the beginning, you can just use brick.children[0].innerHTML instead, and that will get you the quote.

However that’s quite a bit of trouble to get some quotes. Usually, for this kind of job you should just consume JSON data like this. Research about fetch from ES6 if you’re having trouble retrieving JSON data.