Hello, everyone. I need a little help. I am basically trying to make a crawler and then connect it to a database so that the crawled data can be stored and indexed there for retrieval later. But since I’m new to some aspects of Python and SQL, I need a little help. Any advice?
Firstly, welcome to the forums.
With your current questions, we don’t have enough context to know what you already know or don’t know, so it is impossible to guide you without just telling you the answer (which we won’t do).
It is pretty typical on here for people to share a codepen / jsfiddle example of what they have tried so that anyone helping has more of an idea of what help is actually helpful.
Please provide some example of what you’ve tried and I’m sure you’ll get more help.
That’s the problem. As far as a connection between the two is concerned, I don’t have a code yet, not really. That’s what I am here for, to get an idea on how to handle this.
Try looking into the pyodbc module. I’ve used it quite a bit for connecting to Microsoft Access Databases.
Microsoft provides a python MSSQL driver. pymssql
@owel So I download the driver, then what? What do I do next? Do I have to configure something with SQL or something?
There are several example codes here.
This is just the driver for python to talk to an MSSQL server.
But you still need to know how to create/manage databases, tables and views in SQL Server, and know how to construct SQL commands. The driver is not magic, it’s just a go-between bridge between python and MSSQL server.
If you’ll be managing SQL databases, (creating tables, fields, indexes, views, stored procedures, full text, triggers, etc) you need to have Enterprise Manager software installed on your computer, and more importantly know how to use it.
I’m making the SQL connection using Visual Studio, but I’m unclear on how I can get the .py file, which is the crawler, to store crawled data into the SQL database. Suggestions?
Can you post the code you have so far in the .py file?
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
name = “electronics”
allowed_domains = [“www.olx.com.pk”]
start_urls = [
rules = ( Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)), callback="parse_item", follow=True),) def parse_item(self, response): print('Processing..' + response.url) # print(response.text)