Beautifulsoup not working -- soup = BeautifulSoup(html, 'html.parser')

Hi All,

I am following the phyton course and i got to the " 12 - urllinks - Python for Everybody Course" video.

I tried to installed and placed the folder he suggested into where i´m running the python from and it doesn´t work.

the teachers code is:

# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))

and when i run it i get the following error:

Enter - http://www.dr-chuck.com/
Traceback (most recent call last):
  File "/Users/luis/Desktop/Py/code3/urllinks.py", line 16, in <module>
    soup = BeautifulSoup(html, 'html.parser')
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 215, in __init__
    self._feed()
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 164, in feed
    parser.feed(markup)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 110, in feed
    self.goahead(0)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 170, in goahead
    k = self.parse_starttag(i)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 344, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 62, in handle_starttag
    self.soup.handle_starttag(name, None, None, attr_dict)
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 404, in handle_starttag
    self.currentTag, self._most_recent_element)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1001, in __getattr__
    return self.find(tag)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1238, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1259, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 516, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1560, in __init__
    self.text = self._normalize_search_value(text)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1565, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'

I tried already:

to install beautifulsoup using the sudo pip
to download the zip file, unzipped and placed in the same folder

do anyone had the same issue?

Thanks a lot guys!!

Note: im using macOS

1 Like

Hi, think I am doing the same course and have the same error too. I am using Windows.

Cheers

It looks like the BS4 library is trying to use a module collections.Callable and not finding it. As of Python 3.3, the Callable base class was moved to collections.abc.Callable, which is maybe why it can’t find it. There has been some built-in support for handling this, but it looks like support has ended as of Python 3.10.

I would check your current version of Python and BS4 (something like python --version and pip freeze from the terminal). Judging by the file paths in your error, it looks like python might be using the unzipped BS4 files and not using the module that you later installed via pip. I don’t know how up-to-date that zipped module is but you could try removing those unzipped files and reinstalling from the official publication on PyPi via pip if need be.

2 Likes

Welcome there,

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

Hi, you are right. After searching in the internet, i found the solution to be to amend the code in “bs4 > element.py”,

ie change all “collections.Callable” to " collections.abc.Callable".

it works for me thereafter.

cheers

3 Likes

Hi all, I encountered this exact problem today during my Python class. I performed the same change as kylee and now it works. Thank you!!

I encountered the same problem and I did exactly the same as kylee mentioned, it works.
Thank you so much, cheers.

Can u explain how u did, I don’t understand
I don’t know how to amend the file or even locate it, since I installed beautiful soup from command prompt

Can u explain how u did, I don’t understand
I don’t know how to amend the file or even locate it, since I installed beautiful soup from command prompt

1 Like

Hi emechetejohn4,

These are the steps…

  1. locate your “bs4” folder (you can search in your file explorer. )
  2. open the “bs4” folder and you can see the “element” file.
  3. open the “element” file in an editor, eg Atom.
  4. search and change all “collections.Callable” to " collections.abc.Callable". see attached for example.
  5. remember to save the “element” file after you have made the changes.

Also remember to change all the element.py files if you have more than 1 copy of “bs4”

hope this helps.

collections

1 Like

I’m having the same problem. I am using Python 3.10.5. and BeautifulSoup doesn’t work.
I have changed all the collections.Callable to collections.abc.Callable and still get following error message.

Traceback (most recent call last):
File “C:\Joe\12_tags.py”, line 6, in
from bs4 import BeautifulSoup
File “C:\Joe\bs4_init_.py”, line 30, in
from .builder import builder_registry, ParserRejectedMarkup
File “C:\Joe\bs4\builder_init_.py”, line 314, in
from . import _html5lib
File “C:\Joe\bs4\builder_html5lib.py”, line 151, in
class Element(html5lib.treebuilders._base.Node):
AttributeError: module ‘html5lib.treebuilders’ has no attribute ‘_base’. Did you mean: ‘base’?

Any help will be greatly appreciated!

Thank you very much Kylee. I had the same problem and I solved it thanks to you…

This works for me

# Do the imports
from urllib import request, parse, error
from bs4 import BeautifulSoup as bs 
import ssl 

# SSL
ctx = ssl.create_default_context()
ctx.check_hostname = False 
ctx.verify_mode = ssl.CERT_NONE

# Ask for user input
url = input('Enter url\n>> ')

# Check if http:// is in url if not add
if 'http://' not in url:
    url = 'http://'+url

# Get the url
html = request.urlopen(url, context=ctx).read()

# Parse with BeautifulSoup
soup = bs(html, 'html.parser')

# Get anchor tags
tags = soup('a')

# Loop through and print
for tag in tags:
    print(tag.get('href', None))

It won’t allow me to post output because of the links

Thanks a lot, it works for me

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.