I’ve just gone through chapter 12 of the ‘Python for Everybody’ course which is about networking. I’m a little confused about the difference between what was covered in lesson E as compared to F
This is the basic data structure learnt in E (if I make it look consistent with F):
import urllib.request, urllib.parse, urllib.error
url = input('Enter url: ')
html = urllib.request.urlopen(url)
for line in html:
print(line.decode().strip())
I understand that this “opens” the html code of the webpage and removes any white-space, then displays it.
We then have lesson F:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
url = input('Enter url: ')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
Aside from retrieving the tags at the end, wouldn’t this do the same thing as in E if I got rid of that bit and wrote print(soup) ? I guess what I’m asking is what’s special about BeautifulSoup…