This is the basic data structure learnt in E (if I make it look consistent with F):
import urllib.request, urllib.parse, urllib.error url = input('Enter url: ') html = urllib.request.urlopen(url) for line in html: print(line.decode().strip())
I understand that this “opens” the html code of the webpage and removes any white-space, then displays it.
We then have lesson F:
import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup url = input('Enter url: ') html = urllib.request.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') tags = soup('a') for tag in tags: print(tag.get('href', None))
Aside from retrieving the tags at the end, wouldn’t this do the same thing as in E if I got rid of that bit and wrote print(soup) ? I guess what I’m asking is what’s special about BeautifulSoup…