Web scraping with selenium

HI, I was trying to experiment with the web scraping and held with error and no output. Your help is appreciated.
I’m getting a none data

from selenium import webdriver
from selenium.webdriver.common.by import By
driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe')
driver.get('https://')
#driver.maximize_window()
lists = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/div[1]/div')
#print(lists)

from csv import writer
with open(r'C:\Users\Desktop\list.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['col 3', 'col 4', 'col 5', 'col 6','col 7','col 8','col 9','col 10']
    thewriter.writerow(header)
    
    for list in lists:  
        

        col 3 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[2]/td[4]').text.replace('\n', '')
        col 4 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[5]').text.replace('\n', '')
        col 5 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[6]').text.replace('\n', '')
        col 6 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[7]').text.replace('\n', '')
        col 7 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[8]').text.replace('\n', '')
        col 8 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[9]').text.replace('\n', '')
        col 9 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[10]').text.replace('\n', '')
        col 10 = driver.find_element(By.XPATH,'//*[@id="main-content"]/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[11]').text.replace('\n', '')

        info = [col 3, col 4, col 5, col 6,col 7,col 8,col 9,col 10]

        thewriter.writerow(info)]
        element = driver.find_element(By.XPATH, "element_xpath")


the elements as per screenshot
image

I am wondering why is your variable names within the loop are like this: col 3?

You can’t use space in variable names.

Yes, Noted. I have corrected it. I get the error “‘list’ object has no attribute ‘text’”.
Any thoughts on this.

Can you share more details about this? Can you share the whole stacktrace? It’s hard to exactly without the complete error message.

CSV Screenshot

C:\Users\.spyder-py3\scrape_selenium.py:3: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe')
[<selenium.webdriver.remote.webelement.WebElement (session="916bd566098750ef2801e7b129aec8dd", element="324f660d-8a4e-4a1f-8446-2da8eab3632d")>]

Deprecation warnings are just warnings. It shouldn’t be breaking because of that warning.


Are you sure the XPATHs are correct? Did you log them out?

Are you copying the XPATH using the browser?

Without knowing the content of the page you are trying to scrape we can’t really know if you are selecting the elements correctly.

The bracket at the end doesn’t seem to belong there.

thewriter.writerow(info)]

Please post your current code with any corrections or changes you have made.

I have revised the code from xpath to by class name and modified code.
However, did not notice the result. Even though no error.

I have tried to add further class " col-7,col-8,col-9". but it is empty or intendant error. Any thoughts or tweak of code is appreciated.

import xlsxwriter
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

element_list = []

for page in range(1, 3, 1):
	
	page_url = "OcrText=false&searchType=quickSearch&viewType=list" + str(page)
	driver = webdriver.Chrome(ChromeDriverManager().install())
	driver.get(page_url)
	title = driver.find_elements(by=By.CLASS_NAME, value ="col-3")
	price = driver.find_elements(by=By.CLASS_NAME, value ="col-4")
	description = driver.find_elements(by=By.CLASS_NAME, value ="col-5")
	rating = driver.find_elements(by=By.CLASS_NAME, value ="col-6")
    

	for i in range(len(title)):
		element_list.append([title[i].text, price[i].text, description[i].text, rating[i].text])

with xlsxwriter.Workbook(r'C:\Users\Desktop\list.xlsx') as workbook:
	worksheet = workbook.add_worksheet()

	for row_num, data in enumerate(element_list):
		worksheet.write_row(row_num, 0, data)

driver.close()

What are each selector returning, did you log them out before looping?

Shouldn’t it just be By.CLASS_NAME and not by=By.CLASS_NAME? I don’t really use Python and I never tried web scraping with it (well I might have for fun at some point but I don’t remember).


I can’t really help you with the selectors as I don’t know what the page structure is. I would suggest you copy the selector using the browser if you didn’t (check the Using XPaths and CSS Selectors link I gave).

The element structure is copied.
The text is in col-3 to col-10

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.