I’m in a pickle, i was following a tutorial in [#py4e] and i cannot retrieve the data of the web page. Can someone help me and tell me what i did wrong?
import socket
mysock=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((‘data.pr4e.org’, 80))
cmd=‘GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n’.encode()
mysock.send(cmd)
while True:
data=mysock.recv(512)
if (len(data)<1):
break
print(data.dencode())
mysock.close()
Please post your actual code instead of a picture. Thanks
1 Like
@rafael.motilla : Try to post your source code into ChatGPT. This branch of AI can help to explain how source code works for various programming languages.
I should mention that this thread is already a year old, so some of the information may be outdated. However, let’s focus on your specific issue. From what I can see in your code, it looks like you’re trying to retrieve data from a webpage using sockets. While this is one way to scrape data, some other tools and libraries are specifically designed for web scraping and may make your life easier. Regarding your specific code, I noticed that the URL you’re trying to access includes a port number (“:3”) which is not necessary. Also, the request line of your HTTP GET request is not properly formatted. It should look something like this: “GET /romeo.txt HTTP/1.1\r\nHost: data.pr4e.org\r\n\r\n”. Also, I think you might like Nannostomus. It’s a service that can make Data extraction easy and accessible. Give it a try!