How can I scrape/download many files from a dynamic webpage with a hidden xhtml section?

Hi all!

I am trying to download every CSV file on this site: CNMCData - Informe Trimestral

Each CSV is in a radio button inside a sub-page (not a different site) of that main page (I believe the term used to describe this type of site is ‘dynamic’). Example of location of one of the CSV files:

Problem:
When I run the following code in order to find patterns in the html so I can automate the process (such as each radio button having the same class so I can use Selenium to click it or something along those lines), only a small part of the html is retrieved:
Code:

What it retrieves:

What I can see using ‘right-click > inspect’ on the site:

As you can see, the ‘right-click > inspect’ html shown is much more extended (including <a> tags to dynamically navigate the page).

How can I fetch ALL of the html, including those <a> tags?

I believe the answer/problem may lie in this line of code (complete guess, have no clue really):

Thanks for your time

Jaime

The reason there aren’t any anchor elements with links in your soup is because the sidebar is loaded into a different frame within the webpage and the base content only contains text placeholders. You can tell the webdriver object to switch to the frame that contains the sidebar with driver.switch_to.frame(frame_name). driver.switch_to.default_content() will bring you back to a point where you can switch to other frames (You’ll get an error upon switching frames again unless the frame is a subset of the current one). Use the page inspector to get the name of the frame you want.
From there you can get use BeautifulSoup like you had planned or the selenium webdriver to scrape your sidebar links.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.