(Selenium/webscraping noob warning.)
selenium 3.141.0
chromedriver 78
MacOS 10.14.6
I'm compiling a list of URLs across a range of dates for later download. The URLs are in a table that displays information for the date selected on a nearby calendar. When the user clicks a new date on the calendar, the table is updated asynchronously with a new list of URLs or – if no files exist for that date – with a message inside a <td class="dataTables_empty"> tag.
For each date in the desired range, my code clicks the calendar, using WebDriverWait with a custom expectation to track when the first href value in the table changes (indicating the table has finished updating), and scrapes the URLs for that day. If no files are available for a given date, the code looks for the dataTables_empty tag to go away to indicate the next date's URLs have loaded.
if current_first_uri != NO_ATT_DATA:
element = WebDriverWait(browser, 10).until_not(
text_to_be_present_in_href((
By.XPATH, first_uri_in_att_xpath),
current_first_uri))
else:
element = WebDriverWait(browser, 10).until_not(
EC.presence_of_element_located((
By.CLASS_NAME, "dataTables_empty")))
This works great in all my use cases but one: if two or more consecutive days have no data, the code doesn't notice the table has refreshed, since the dataTables_empty class remains in the table (and the cell is identical in every other respect).
In the Chrome inspector, when I click from one date without data to another, the corresponding <td> flashes pink. That suggests the values are being updated, even though their values remain the same.
Questions:
- Is there a mechanism in Selenium to detect that the value was refreshed, even if it hasn't changed?
- If not, any creative ideas on how to determine the table has refreshed in the problem use case? I don't want to wait blindly for some arbitrary length of time.