Web scraping with selenium

joecph · May 10, 2022, 11:16am

HI, I was trying to experiment with the web scraping and held with error and no output. Your help is appreciated.
I’m getting a none data

from selenium import webdriver
from selenium.webdriver.common.by import By
driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe')
driver.get('https://')
#driver.maximize_window()
lists = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/div[1]/div')
#print(lists)

from csv import writer
with open(r'C:\Users\Desktop\list.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['col 3', 'col 4', 'col 5', 'col 6','col 7','col 8','col 9','col 10']
    thewriter.writerow(header)
    
    for list in lists:  
        

        col 3 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[2]/td[4]').text.replace('\n', '')
        col 4 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[5]').text.replace('\n', '')
        col 5 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[6]').text.replace('\n', '')
        col 6 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[7]').text.replace('\n', '')
        col 7 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[8]').text.replace('\n', '')
        col 8 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[9]').text.replace('\n', '')
        col 9 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[10]').text.replace('\n', '')
        col 10 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[11]').text.replace('\n', '')

        info = [col 3, col 4, col 5, col 6,col 7,col 8,col 9,col 10]

        thewriter.writerow(info)]
        element = driver.find_element(By.XPATH, "element_xpath")

the elements as per screenshot

hussain_sajib · May 11, 2022, 12:07pm

I am wondering why is your variable names within the loop are like this: col 3?

You can’t use space in variable names.

joecph · May 11, 2022, 3:38pm

Yes, Noted. I have corrected it. I get the error “‘list’ object has no attribute ‘text’”.
Any thoughts on this.

hussain_sajib · May 12, 2022, 6:16pm

Can you share more details about this? Can you share the whole stacktrace? It’s hard to exactly without the complete error message.

joecph · May 14, 2022, 8:31am

CSV Screenshot

joecph · May 14, 2022, 6:55pm

C:\Users\.spyder-py3\scrape_selenium.py:3: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe')
[<selenium.webdriver.remote.webelement.WebElement (session="916bd566098750ef2801e7b129aec8dd", element="324f660d-8a4e-4a1f-8446-2da8eab3632d")>]

lasjorg · May 14, 2022, 7:14pm

Deprecation warnings are just warnings. It shouldn’t be breaking because of that warning.

Are you sure the XPATHs are correct? Did you log them out?

Are you copying the XPATH using the browser?

Without knowing the content of the page you are trying to scrape we can’t really know if you are selecting the elements correctly.

The bracket at the end doesn’t seem to belong there.

thewriter.writerow(info)]

Please post your current code with any corrections or changes you have made.

joecph · May 14, 2022, 7:34pm

I have revised the code from xpath to by class name and modified code.
However, did not notice the result. Even though no error.

I have tried to add further class " col-7,col-8,col-9". but it is empty or intendant error. Any thoughts or tweak of code is appreciated.

import xlsxwriter
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

element_list = []

for page in range(1, 3, 1):
	
	page_url = "OcrText=false&searchType=quickSearch&viewType=list" + str(page)
	driver = webdriver.Chrome(ChromeDriverManager().install())
	driver.get(page_url)
	title = driver.find_elements(by=By.CLASS_NAME, value ="col-3")
	price = driver.find_elements(by=By.CLASS_NAME, value ="col-4")
	description = driver.find_elements(by=By.CLASS_NAME, value ="col-5")
	rating = driver.find_elements(by=By.CLASS_NAME, value ="col-6")
    

	for i in range(len(title)):
		element_list.append([title[i].text, price[i].text, description[i].text, rating[i].text])

with xlsxwriter.Workbook(r'C:\Users\Desktop\list.xlsx') as workbook:
	worksheet = workbook.add_worksheet()

	for row_num, data in enumerate(element_list):
		worksheet.write_row(row_num, 0, data)

driver.close()

lasjorg · May 14, 2022, 7:54pm

What are each selector returning, did you log them out before looping?

Shouldn’t it just be By.CLASS_NAME and not by=By.CLASS_NAME? I don’t really use Python and I never tried web scraping with it (well I might have for fun at some point but I don’t remember).

I can’t really help you with the selectors as I don’t know what the page structure is. I would suggest you copy the selector using the browser if you didn’t (check the Using XPaths and CSS Selectors link I gave).

joecph · May 16, 2022, 5:34am

The element structure is copied.
The text is in col-3 to col-10

system · November 14, 2022, 5:35pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Selenium- Python - Please help Python	2	608	November 23, 2021
Selenium \| css-selector not found Python	5	2767	October 15, 2021
Python Selenium Help	1	1042	June 1, 2021
Problem with Selenium Python	1	272	October 18, 2022
Web Scraping Dilemma part 2! Python	2	1104	June 1, 2021

Web scraping with selenium

Related topics