Python net scraping - BeautifulSoup cannot find 'html-parser'

mientje · January 29, 2021, 12:10pm

Hi,

EM:

bs4.FeatureNotFound: Couldn’t find a tree builder with the features you requested: html-parser. Do you need to install a parser library?.

I can’t find it. It can open the url and read the contents because I can print those. But it just won’t parse. I tried installing bs4 on my pc and it has done that. I tried downloading the zip folder (http://www.py4e.com/code3/bs4.zip) and unzipped it according to the instructions. The folder and the py file live in the same folder. It is accessing that folder otherwise I wouldn’t be able to print the html. The code is identical to the code in the video. I work with ubuntu 20 and it has python3 as the default version automatically installed.

This is the code:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup

url = input('Enter – ')
html = urllib.request.urlopen(url).read()
print(html)
soup = BeautifulSoup(html, 'html-parser')

#retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
	print(tag.get('href', None))

I have a hunch it’s something small, I just can’t see it.

Thank you
Karin

sanity · January 29, 2021, 7:42pm

I believe there should be dot instead of dash, so 'html.parser'.

mientje · January 29, 2021, 8:21pm

You are my hero. Thank you very much!
Greets,
Karin

system · July 31, 2021, 8:22am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.