Webscraping...I'm missing an update but I don't know which one it could be

I am following this web tutorial for webscraping (https://www.youtube.com/watch?v=XVv6mJpFOb0 and I have installed bs4, html5lib, and lxml, but when I try to run the following code:

from bs4 import BeautifulSoup

with open('home.html', 'r') as html_file:
    content = html_file.read()
    print(content)
    
    soup = BeautifulSoup(content, 'lxml')
    print(soup.prettify())

and I receive the following error message:

C:\Users\Admin\Desktop\PY4E\fcc_py\bs4\element.py:15: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used.
  warnings.warn(

Traceback (most recent call last):
  File "C:\Users\Admin\Desktop\PY4E\fcc_py\webscrape.py", line 7, in <module>
    soup = BeautifulSoup(content, 'lxml')
  File "C:\Users\Admin\Desktop\PY4E\fcc_py\bs4\__init__.py", line 248, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

I have the latest update of Python so I assume that includes soupsieve and I’ve done the necessary updates so I have html5 library as well as lxml for any broken html…what am I missing?

Is there an additional parser library I’m unaware of?

There are a few things to try here: https://stackoverflow.com/questions/24398302/bs4-featurenotfound-couldnt-find-a-tree-builder-with-the-features-you-requeste

I tried some of the techniques from the above website before but they didn’t work.

If I don’t receive a warning label asking me to check my parser, then it’s a warning that I don’t have soupsieve installed. Yet, I installed it on the command line and received notification >(Requirement already satisfied: soupsieve in c:\python311\lib\site-packages (2.3.2.post1)>) Yet, I still receive a warning that I don’t have soupsieve. I have issues with the parser that are not resolved by installing html5lib , requests and lxml

What am I missing here? It can’t possibly be THIS complicated…

What are you using to code? (VSCode, pycharm)? Did you check those are using the correct version of Python?

Do you get an error message if you try html.parser instead of lxml?:

soup = BeautifulSoup(content, 'html.parser')

Can you please show the “already satisfied” output of pip install lxml?

I’m using VS Code.

Below is the output of the lxml installation:

C:\>pip install lxml
Requirement already satisfied: lxml in c:\python311\lib\site-packages (4.9.2)

This is from the Terminal within the VS Code app?

Can you also confirm the lower left blue bar in VSCode shows "Python 3.11?

Yes. The lower bar shows 3.10.11

Can you click on that and select 3.11? Your output shows that lxml is installed for 3.11, not 3.10

I changed Python from 3.10.11 to 3.11.1, but each time I received the following error message:

C:\Users\Admin\Desktop\PY4E\fcc_py\bs4\element.py:15: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used.
  warnings.warn(

I ran the code in the command line and in VS Code.
I don’t know how to rectify this soupsieve problem. I’ve installed it; yet, it does not recognize the installation.

VS Code shows Python 3.11.1 in the blue bar at the bottom left?

Maybe try removing and re-installing soupsieve.

On the bright side, your code works fine in a Google Colab notebook

Actually it’s running on my VS Code now. I was missing a few words in my code.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.