Web Scraping Job Adverts

Hi,

I’m in the process of learning how to web scarping and doing the walkthrough with this tutorial provided by freecodecamp: Web Scraping with Python - Beautiful Soup Crash Course - YouTube.

I have ran into an issue though whilst trying to run the python code on pycharm as it takes forever to run. Is there a way to overcome this and make it run faster?

I have attached my code below:

from bs4 import BeautifulSoup
import request
#will write a program that pulls jobs from advertising website that have been posted today only.

html_text = requests.get('https://www.totaljobs.com/jobs/python?radius=10').text
soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('div', class_ = 'ResultsSectionContainer-sc-gdhf14-0 kteggz')
print(jobs)

Any help or advice to make the code run would be much appreciated!
Bianca

Are you sure that URL your using will respond to a standard GET request?

Before trying to parse it with BS have you attempted to just output the response to console to see what you are getting… if its hanging, my guess is that the URL you are using isn’t responding to your GET request, and the program is just waiting. Most major pages don’t respond to unsolicited GET requests in my limited experience. Usually if you want to collect data from a website you need to see if they have an API and set of instructions for downloading data.

1 Like

Hi,

Thank you for the help. I have ran the response of html_text and didn’t get anything, it must be due to what you had suggest as to an API being required.

Quick question is there a way to identify if a website has an API or not?

Not sure, I’m still kinda new. One thing I noticed about the URL you were using was that it didn’t just take me to a page when typed in a browser, it took me to a loading prompt meaning its running scripts, so not just returning HTML. If you use the URL they use in the lesson you were watching your code does work, so you don’t necessarily need an API, but basically you need a simple page that returns HTML, not some scripted server site. I did something similar with the freeCodeCamp python cert and found I could read my pages, and other basic pages, but things like google.com, or other large pages denighed such simple requests.

Got you, thank you for sharing that with me.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.