Python for Everybody - ERROR - Pagerank Spider Exercise

Hi everyone,

(Reposting this question that was asked before but no answer yet)

I’m trying to complete the Page Spider Exercise linked in the last lesson of the Python for Everybody course, so I can better understand the learnings.

However, when I try to run the program (spider.py) it comes up with an error, like this:

users:$ python3 spider.py
Enter web url or enter:
[This is where you enter dr-chuck dot com url]
How many pages:5
1 h…dr-chuck dot com Unable to retrieve or parse page
No unretrieved HTML pages found

I’ve watched the exercise video and Dr Chuck says that if this error comes up it’s likely an issue with BeautifulSoup or something, but I have version 4 installed which is the latest version.

Does anyone have any ideas why this won’t run for me?

The file I’m using is spider.py located on the PY4E website

I have been trying to troubleshoot a look-up a solution and my last resort is post this message again and see if anyone else was facing this issue?

Thanks!

D

Can you paste the code here please or link to it?

Hello, I’m not able to post the link, lets see if this one works of the last time this question was posted

Python for Everybody - ERROR: Page Spider Exercise - Python - The freeCodeCamp Forum

You need to type http://www.dr-chuck.com when it asks for a URL

Enter web url or enter: http://www.dr-chuck.com
[‘http://www.dr-chuck.com’]
How many pages:5
1 http://www.dr-chuck.com (8386) 2

Hi, I tried that but no avail. I also tried this link: python-data.dr-chuck dot net

But I get the same output: No unretrieved HTML pages found

Can you past the error in here as you did before?

You can see my output above is successful. Maybe try a different URL?

Are you typing “dot net” or “.net” ?

This forum isn’t letting me post external links so I was writing that way. In the pycharm program for spider.py when I run, I enter .net.

Can you screenshot the error and post it?

Might have better luck pasting the errors between backticks or with a > blockquote

like this

Try the blockquote or premformatted text options to paste an error

Ok, this is a different error message, read it again. You need to go to the next step of the excercise

The spider is going out and looking for pages, and comparing it to a local database. It’s already retrieved what it’s looking for.

(Sorry I’m not able to post that youtube link here) I found this youtube link of the walkthrough of this code by the author of the tutorial. It is at ‘Exercise: Page Spider’ in this freecodecamp link pasted below.

Python for Everybody - Data Visualization: Mailing Lists | Learn | freeCodeCamp.org

At 11:37 mark you can see the output should be something else.

My error message is

No unretrieved HTML pages found

It couldn’t find any pages at that link but it should have.

Delete or rename spider.sqlite to spider.sqlite.old

Is this your full error?

Enter web url or enter: http://www.dr-chuck.com
[‘dr-chuck dot com’, ‘http://www.dr-chuck.com’]
How many pages:5
No unretrieved HTML pages found

Compare to the previous time I ran the program:

How many pages:15
23 http://www.dr-chuck.com/Sakai_ Building an Open Source Community - Charles R. > Severance.epub Unable to retrieve or parse page
25 http://www.dr-chuck.com/html Unable to retrieve or parse page
26 http://www.dr-chuck.com/errata.txt Unable to retrieve or parse page
24 http://www.dr-chuck.com/Sakai_ Building an Open Source Community - Charles R. Severance.PDF Unable to retrieve or parse page
No unretrieved HTML pages found

See the message at the end? It means it has all the pages it can find. The next time I run it, I only get this message:
No unretrieved HTML pages found
Because I already ran it and the pages are stored in the database.

Yes, this is the full error. In the code you have option to just hit enter and it will load the default dr-chuck url still get the same error.

I did the renaming like you suggested and ran it again. The program created a new spider.sqlite file and still doesn’t run past the error I’ve been getting. I checked the contents of the sqlite file and they’re same as before

But when you open spider.sqlite from dbbrwoser, do you see 15 entries like it is supposed to? I’m able to only see 1 which is the default page.

Can you show me the full output after deleting (or renaming) the sql database?
And show me the database contents.

No unretrieved HTML pages found isn’t an error, that’s what it shows after it’s complete.

Also, can you check your firewall and just make sure Pycharm is allowed through?

What happens when you run spdump.py ?