Web Scraping with Python

rafael.motilla · September 21, 2022, 12:45am

I’m in a pickle, i was following a tutorial in [#py4e] and i cannot retrieve the data of the web page. Can someone help me and tell me what i did wrong?

import socket

mysock=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((‘data.pr4e.org’, 80))
cmd=‘GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n’.encode()
mysock.send(cmd)

while True:
data=mysock.recv(512)
if (len(data)<1):
break
print(data.dencode())
mysock.close()

22.09.20 EX12-01 (2)

JeremyLT · September 21, 2022, 12:50am

Please post your actual code instead of a picture. Thanks

waellerbe · March 7, 2023, 8:11pm

@rafael.motilla : Try to post your source code into ChatGPT. This branch of AI can help to explain how source code works for various programming languages.

Tulonarinchu · May 12, 2023, 6:01pm

I should mention that this thread is already a year old, so some of the information may be outdated. However, let’s focus on your specific issue. From what I can see in your code, it looks like you’re trying to retrieve data from a webpage using sockets. While this is one way to scrape data, some other tools and libraries are specifically designed for web scraping and may make your life easier. Regarding your specific code, I noticed that the URL you’re trying to access includes a port number (“:3”) which is not necessary. Also, the request line of your HTTP GET request is not properly formatted. It should look something like this: “GET /romeo.txt HTTP/1.1\r\nHost: data.pr4e.org\r\n\r\n”. Also, I think you might like Nannostomus. It’s a service that can make Data extraction easy and accessible. Give it a try!

Topic		Replies	Views
Networking: Write a Web Browser Python	5	443	December 28, 2023
Making Sockets work for another website - Chp 12C Python	1	245	November 17, 2021
Python - An HTTP Request in Python Python	3	549	July 23, 2023
I'm stuck with socket1 exercise Python	5	965	June 1, 2021
Problem receiving data with sockets Python	1	381	December 27, 2021

Web Scraping with Python

I’m in a pickle, i was following a tutorial in [#py4e] and i cannot retrieve the data of the web page. Can someone help me and tell me what i did wrong?

Related topics