I am trying to scrape a table based on values from drop down lists from a web-page with multiple drop down lists (it requires login so I cannot post it here).
There are three drop down lists: state
, muni
, and year
. Thus, there are very many tables I want to iterate and scrape through: state * muni * year
.
I want to iterate and scrape through state (first), get muni (first), and all the years.
Then on the same state (first), get the next muni (second), and scrape tables from all the years:
state(1), muni(1), year(all)
state(1), muni(2), year(all)
...
state(last), muni(last), year(all)
Pseudo-code:
for i in each unique state:
select each muni
for j in each muni:
scrape each table from each year j in a year list
append the year list in the muni list in a state list
I have done this so far, but it keeps iterating the years forever at the first state and muni, but doesn't move to the next one. Do you have any tips for how I can fix the issue? Any help is appreciated.
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
muni = []
year = []
data = []
for i in state:
select_state = Select(browser.find_element_by_class_name("lists-landingpage--navigation-regionSelector"))
select_state.select_by_value(i)
options_muni = browser.find_element_by_class_name("lists-landingpage--navigation-subRegionSelector")
options_muni = options_muni.find_elements_by_tag_name('option')
for j in options_muni:
muni.append(j.get_attribute("value"))
for k in muni:
select_muni = Select(browser.find_element_by_class_name("lists-landingpage--navigation-subRegionSelector"))
select_muni.select_by_value(k)
options_year = browser.find_element_by_class_name("lists-landingpage--navigation-yearSelector")
options_year = options_year.find_elements_by_tag_name('option')
for n in options_year:
year.append(n.get_attribute("value"))
table = soup.find('div', attrs = {'class': 'lists-landingpage--body'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
How can I append them in lists (year
) in lists (muni
) in a list (state
)?