Hi,
I’m trying to load dynamic content using WebDriverWait but keeping running into the following error:
Traceback (most recent call last):
File “C:\Users\Bianc\PycharmProjects\Python Course CodeFirstGirls\Webscraping\dynamicallyloading_usingselenium.py”, line 36, in
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, “[data-testid=‘firstListCardGroup-editorial’]”))) # We are waiting for 5 seconds for our element with the attribute data-testid set as firstListCardGroup-editorial
File “C:\Users\Bianc\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\support\wait.py”, line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x00B9B8F3+2406643]
Ordinal0 [0x00B2AF31+1945393]
Ordinal0 [0x00A1C748+837448]
Ordinal0 [0x00A492E0+1020640]
Ordinal0 [0x00A4957B+1021307]
Ordinal0 [0x00A76372+1205106]
Ordinal0 [0x00A642C4+1131204]
Ordinal0 [0x00A74682+1197698]
Ordinal0 [0x00A64096+1130646]
Ordinal0 [0x00A3E636+976438]
Ordinal0 [0x00A3F546+980294]
GetHandleVerifier [0x00E09612+2498066]
GetHandleVerifier [0x00DFC920+2445600]
GetHandleVerifier [0x00C34F2A+579370]
GetHandleVerifier [0x00C33D36+574774]
Ordinal0 [0x00B31C0B+1973259]
Ordinal0 [0x00B36688+1992328]
Ordinal0 [0x00B36775+1992565]
Ordinal0 [0x00B3F8D1+2029777]
BaseThreadInitThunk [0x75FC6739+25]
RtlGetFullPathName_UEx [0x77548FEF+1215]
RtlGetFullPathName_UEx [0x77548FBD+1165]
My code:
from bs4 import BeautifulSoup
from selenium import webdriver
Goal is to extract content that is loaded dynamically (i.e. information that will change with time).
We will scrape the editorial list of each movie and ass it to our current results of the total scraped information
Packages we need to get dynamic content to load
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
option = webdriver.ChromeOptions()
option.add_argument(’–headless’) # opens chrome in headles mode
option.add_argument(’–no-sandbox’) #
option.add_argument(’–disable-dev-sh-usage’)
Replace YOUR-PATH-TO-CHROMEDRIVER with your chromedriver location
driver = webdriver.Chrome(r’C:\Users\Bianc\PycharmProjects\chromedriver.exe’, options=option) # Because we are using a normal string as a path an error will appear hence we must convert it t a raw string.
driver.get(‘Top 250 Movies - IMDb’) # Getting page HTML through request
soup = BeautifulSoup(driver.page_source, ‘html.parser’) # Parsing content using beautifulsoup. Notice driver.page_source instead of page.content
totalScrapedInfo = # In this list we will save all the information we scrape
links = soup.find_all(‘td’, class_ = ‘titleColumn’) # Selecting all of the movies with titles
first10 = links[:10] # First 10 movies only
for link in first10:
title = link.find(‘a’)
driver.get(‘https://www.imdb.com/’ + title.get(‘href’)) # Access the movie’s page
infolist = driver.find_elements(by=By.CSS_SELECTOR,value = ‘.ipc-inline-list’)[0] # Find the first element with class ‘ipc-inline-list’
informations = infolist.find_elements(by=By.CSS_SELECTOR, value = “[role=‘presentation’]”) # Find all elements with role=’presentation’ from the first element with class ‘ipc-inline-list’
scrapedInfo = {
“title”: link.find(‘a’).text,
“year”: informations[0].text,
“duration”: informations[2].text,
} # Save all the scraped information in a dictionary
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, “[data-testid=‘firstListCardGroup-editorial’]”))) # We are waiting for 5 seconds for our element with the attribute data-testid set as firstListCardGroup-editorial
listElements = driver.find_elements_by_css_selector("[data-testid=‘firstListCardGroup-editorial’] .listName") # Extracting the editorial lists elements
listNames = # Creating an empty list and then appending only the elements texts
# We are adding all the items for one movie (i.e. link) in the editorial list into the listName
for el in listElements:
listNames.append(el.text)
scrapedInfo[‘editorial-list’] = listNames # Adding the editorial list names to our scrapedInfo dictionary
totalScrapedInfo.append(scrapedInfo) # Append the dictionary to the totalScrapedInformation list
print(totalScrapedInfo) # Display the list with all the information we scraped
Any help would be much appreciated.
Bianca