Hi everyone,
(Reposting this question that was asked before but no answer yet)
I’m trying to complete the Page Spider Exercise linked in the last lesson of the Python for Everybody course, so I can better understand the learnings.
However, when I try to run the program (spider.py) it comes up with an error, like this:
users:$ python3 spider.py
Enter web url or enter:
[This is where you enter dr-chuck dot com url]
How many pages:5
1 h…dr-chuck dot com Unable to retrieve or parse page
No unretrieved HTML pages found
I’ve watched the exercise video and Dr Chuck says that if this error comes up it’s likely an issue with BeautifulSoup or something, but I have version 4 installed which is the latest version.
Does anyone have any ideas why this won’t run for me?
The file I’m using is spider.py located on the PY4E website
I have been trying to troubleshoot a look-up a solution and my last resort is post this message again and see if anyone else was facing this issue?
Thanks!
D
Can you paste the code here please or link to it?
Hello, I’m not able to post the link, lets see if this one works of the last time this question was posted
Python for Everybody - ERROR: Page Spider Exercise - Python - The freeCodeCamp Forum
You need to type http://www.dr-chuck.com
when it asks for a URL
Enter web url or enter: http://www.dr-chuck.com
[‘http://www.dr-chuck.com’]
How many pages:5
1 http://www.dr-chuck.com (8386) 2
Hi, I tried that but no avail. I also tried this link: python-data.dr-chuck dot net
But I get the same output: No unretrieved HTML pages found
Can you past the error in here as you did before?
You can see my output above is successful. Maybe try a different URL?
Are you typing “dot net
” or “.net
” ?
This forum isn’t letting me post external links so I was writing that way. In the pycharm program for spider.py when I run, I enter .net.
Can you screenshot the error and post it?
Might have better luck pasting the errors between backticks
or with a >
blockquote
like this
Try the blockquote or premformatted text options to paste an error
Ok, this is a different error message, read it again. You need to go to the next step of the excercise
The spider is going out and looking for pages, and comparing it to a local database. It’s already retrieved what it’s looking for.
(Sorry I’m not able to post that youtube link here) I found this youtube link of the walkthrough of this code by the author of the tutorial. It is at ‘Exercise: Page Spider’ in this freecodecamp link pasted below.
Python for Everybody - Data Visualization: Mailing Lists | Learn | freeCodeCamp.org
At 11:37 mark you can see the output should be something else.
My error message is
No unretrieved HTML pages found
It couldn’t find any pages at that link but it should have.
Delete or rename spider.sqlite
to spider.sqlite.old
Compare to the previous time I ran the program:
How many pages:15
23 http://www.dr-chuck.com/Sakai_ Building an Open Source Community - Charles R. > Severance.epub Unable to retrieve or parse page
25 http://www.dr-chuck.com/html Unable to retrieve or parse page
26 http://www.dr-chuck.com/errata.txt Unable to retrieve or parse page
24 http://www.dr-chuck.com/Sakai_ Building an Open Source Community - Charles R. Severance.PDF Unable to retrieve or parse page
No unretrieved HTML pages found
See the message at the end? It means it has all the pages it can find. The next time I run it, I only get this message:
No unretrieved HTML pages found
Because I already ran it and the pages are stored in the database.
Yes, this is the full error. In the code you have option to just hit enter and it will load the default dr-chuck url still get the same error.
I did the renaming like you suggested and ran it again. The program created a new spider.sqlite file and still doesn’t run past the error I’ve been getting. I checked the contents of the sqlite file and they’re same as before
But when you open spider.sqlite from dbbrwoser, do you see 15 entries like it is supposed to? I’m able to only see 1 which is the default page.
Can you show me the full output after deleting (or renaming) the sql database?
And show me the database contents.
No unretrieved HTML pages found
isn’t an error, that’s what it shows after it’s complete.
Also, can you check your firewall and just make sure Pycharm is allowed through?
What happens when you run spdump.py ?