Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 99413

Values append to list unexpectedly after scrolling through webelement in Selenium

$
0
0

I'm new to python (and, well, programming in general), and I'm wanting to scrape data from a webelement that dynamically updates after scrolling using Selenium, similar to this post: Trying to use Python and Selenium to scroll and scrape a webpage iteratively. Similar to the screenshot seen in that question, my webelement is a table of data with headers, which may have both a horizontal or vertical scroll bar.

The first thing I want to do is scroll across my webelement (one column at a time, so as not to skip any columns) and scrape all the headers. So far, I can confirm that I have the correct xpath for my webelement's horizontal scroll bar, and that I am able to scroll horizontally across the webelement one column at a time. See below for my code as is, which is code I have adjusted from this question Python Selenium - Adjust pause_time to scroll down in infinite page:

scraped_headers = []
headers = driver.find_elements_by_xpath("//div[@class='gbData']")
for header in headers:
   if header not in scraped_headers:
      scraped_headers.append(header)
      print(header.text)
last_header = scraped_headers[-1]

width_scrollbar = driver.find_element_by_xpath("""/html/body/div[5]/div[2]/div/div/div/div/div[4]/div[5]/div[2]/div[3]""")

while True:
   driver.execute_script("arguments[0].scrollLeft += 50;", width_scrollbar)
   time.sleep(.5)
   new_header = driver.find_elements_by_xpath("//div[@class='gbData']")[-1]
   if new_header.text == last_header.text:
      break
   headers = driver.find_elements_by_xpath("//div[@class='gbData']")
   for header in headers:
      if header not in scraped_headers:
         scraped_headers.append(header)
         last_header = scraped_headers[-1]
         print(header.text)

However, I am observing an unexpected behavior which I cannot seem to wrap my head around. A print() of the value for last_header.text just prior to this code:

   driver.execute_script("arguments[0].scrollLeft += 50;", width_scrollbar)
   time.sleep(.5)

will show the last header that I scraped (as expected; and therefore will match the print in my first for loop). A print() of the value for last_header.text just after that code will show the latest header in the webelement even though there is no reason (as I understand it) why it should be appended to the list at that point. Consequently, new_header.text will equal last_header.text and my while loop will break.

Interestingly, I can seem to just do the following:

scraped_headers = []
headers = driver.find_elements_by_xpath("//div[@class='gbData']")
for header in headers:
   if header not in scraped_headers:
      scraped_headers.append(header)
      print(header.text)
last_header = scraped_headers[-1]

width_scrollbar = driver.find_element_by_xpath("""/html/body/div[5]/div[2]/div/div/div/div/div[4]/div[5]/div[2]/div[3]""")

while True:
   driver.execute_script("arguments[0].scrollLeft += 50;", width_scrollbar)
   time.sleep(.5)
   print(last_header.text)

-and my program will print every new header that appears until it just repeats the last one in the list; but I wouldn't know how to break out of the loop!

What is going on? Am I missing something obvious?

Any help is appreciated!


Viewing all articles
Browse latest Browse all 99413

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>