Okay so I thought I had a grasp on web scraping but apparently not. So, i was given a url to scrape I can get it to work but not like it should.I need it to draw of the title, summary, and url of each article on the main webpage. The results I am getting is weird combination of what I want (unfortunately the uncleaned version, I need the cleaned up version which is also stumping me) with a bunch of other mostly empty lines with the websites url. Here is my exact code:
# Imports required for retrieveing files and webpages and creating a usable
# format for them.
import bs4
import requests
import csv
# Asks the user to choose the urll the want to scrape
url = (" https://cybersins.com/")
# Grabs the website you want and turns it into a text file
res = requests.get(url).text
# Parses the webpage for the soup
soup = bs4.BeautifulSoup(res , 'lxml')
# Sets writer list expectations for csv
writer = ('headline', 'summary', 'url')
# Pulls the attributes wanted from the file
for article in soup.find_all('article'):
headline = article.find('h4', 'a', 'href', class_='title')
print(headline)
summary = article.find_all('div', class_='post-excerpt')
print(summary)
# Grabbing an anchor tag using dictionary formating
articleUrl = article.find('a')('href')
print(url)
# Ensures spacing in the final product
print()
# Placing writer inside the loop and using 'a' (append) makes it grab every article
# and adds it to the text file
with open('TestScrape.txt', 'a') as file:
csv_writer = csv.writer(file)
csv_writer.writerow([headline])
csv_writer.writerow([summary])
csv_writer.writerow([url])