Issues with WebDriverWait

biancambali1999 · May 21, 2022, 3:53pm

Hi,

I’m trying to load dynamic content using WebDriverWait but keeping running into the following error:

Traceback (most recent call last):
File “C:\Users\Bianc\PycharmProjects\Python Course CodeFirstGirls\Webscraping\dynamicallyloading_usingselenium.py”, line 36, in
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, “[data-testid=‘firstListCardGroup-editorial’]”))) # We are waiting for 5 seconds for our element with the attribute data-testid set as firstListCardGroup-editorial
File “C:\Users\Bianc\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\support\wait.py”, line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x00B9B8F3+2406643]
Ordinal0 [0x00B2AF31+1945393]
Ordinal0 [0x00A1C748+837448]
Ordinal0 [0x00A492E0+1020640]
Ordinal0 [0x00A4957B+1021307]
Ordinal0 [0x00A76372+1205106]
Ordinal0 [0x00A642C4+1131204]
Ordinal0 [0x00A74682+1197698]
Ordinal0 [0x00A64096+1130646]
Ordinal0 [0x00A3E636+976438]
Ordinal0 [0x00A3F546+980294]
GetHandleVerifier [0x00E09612+2498066]
GetHandleVerifier [0x00DFC920+2445600]
GetHandleVerifier [0x00C34F2A+579370]
GetHandleVerifier [0x00C33D36+574774]
Ordinal0 [0x00B31C0B+1973259]
Ordinal0 [0x00B36688+1992328]
Ordinal0 [0x00B36775+1992565]
Ordinal0 [0x00B3F8D1+2029777]
BaseThreadInitThunk [0x75FC6739+25]
RtlGetFullPathName_UEx [0x77548FEF+1215]
RtlGetFullPathName_UEx [0x77548FBD+1165]

My code:
from bs4 import BeautifulSoup
from selenium import webdriver

Goal is to extract content that is loaded dynamically (i.e. information that will change with time).

We will scrape the editorial list of each movie and ass it to our current results of the total scraped information

Packages we need to get dynamic content to load

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

option = webdriver.ChromeOptions()
option.add_argument(’–headless’) # opens chrome in headles mode
option.add_argument(’–no-sandbox’) #
option.add_argument(’–disable-dev-sh-usage’)

Replace YOUR-PATH-TO-CHROMEDRIVER with your chromedriver location

driver = webdriver.Chrome(r’C:\Users\Bianc\PycharmProjects\chromedriver.exe’, options=option) # Because we are using a normal string as a path an error will appear hence we must convert it t a raw string.

driver.get(‘Top 250 Movies - IMDb’) # Getting page HTML through request
soup = BeautifulSoup(driver.page_source, ‘html.parser’) # Parsing content using beautifulsoup. Notice driver.page_source instead of page.content

totalScrapedInfo = # In this list we will save all the information we scrape
links = soup.find_all(‘td’, class_ = ‘titleColumn’) # Selecting all of the movies with titles
first10 = links[:10] # First 10 movies only
for link in first10:
title = link.find(‘a’)
driver.get(‘https://www.imdb.com/’ + title.get(‘href’)) # Access the movie’s page
infolist = driver.find_elements(by=By.CSS_SELECTOR,value = ‘.ipc-inline-list’)[0] # Find the first element with class ‘ipc-inline-list’
informations = infolist.find_elements(by=By.CSS_SELECTOR, value = “[role=‘presentation’]”) # Find all elements with role=’presentation’ from the first element with class ‘ipc-inline-list’
scrapedInfo = {
“title”: link.find(‘a’).text,
“year”: informations[0].text,
“duration”: informations[2].text,
} # Save all the scraped information in a dictionary
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, “[data-testid=‘firstListCardGroup-editorial’]”))) # We are waiting for 5 seconds for our element with the attribute data-testid set as firstListCardGroup-editorial
listElements = driver.find_elements_by_css_selector("[data-testid=‘firstListCardGroup-editorial’] .listName") # Extracting the editorial lists elements
listNames = # Creating an empty list and then appending only the elements texts
# We are adding all the items for one movie (i.e. link) in the editorial list into the listName
for el in listElements:
listNames.append(el.text)
scrapedInfo[‘editorial-list’] = listNames # Adding the editorial list names to our scrapedInfo dictionary
totalScrapedInfo.append(scrapedInfo) # Append the dictionary to the totalScrapedInformation list

print(totalScrapedInfo) # Display the list with all the information we scraped

Any help would be much appreciated.
Bianca

lasjorg · May 21, 2022, 4:50pm

Did you try just increasing the timeout? Not sure if scrolling the page might make it load the component faster. It looks like the page also has a data-testid="delayed-loader" when the lists are loading which might be used as well.

https://selenium-python.readthedocs.io/waits.html

biancambali1999 · May 23, 2022, 7:01pm

Thank you, I managed to get it working by extending the time invertal for wait .

system · November 22, 2022, 7:02am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Selenium python - help Python	2	279	January 25, 2022
Selenium \| css-selector not found Python	5	2793	October 15, 2021
Pythion Selenium - Best way to wait for page load Python	1	767	June 17, 2022
Web scraping with selenium Python	10	1471	November 14, 2022
Python Selenium Help	1	1043	June 1, 2021

Issues with WebDriverWait

Goal is to extract content that is loaded dynamically (i.e. information that will change with time).

We will scrape the editorial list of each movie and ass it to our current results of the total scraped information

Packages we need to get dynamic content to load

Replace YOUR-PATH-TO-CHROMEDRIVER with your chromedriver location

Related topics