Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 97781

Scrapy: selecting dynamically-loaded content with "More" button

$
0
0

I'm trying to scrape content from page similar to this: https://www.newsweek.pl/nwpl_2018002_20181231. It has "More" (pl. Więcej) button at the bottom of the page, which dynamically loads next articles. Preferably I would like to use Scrapy to do the task, because my other spiders use it, but first I need all of the articles urls; so I'm trying to click() this button with Selenium as follow:

def parse_issue(self, response):
        self.logger.info('Parse function called parse_issue on {}'.format(response.url))
        self.driver.get(response.url)
        while True:
            try:
                more_button = self.driver.find_element_by_xpath('//div[@class="showMoreBtn"]')
                time.sleep(2)
                more_button.click()
                time.sleep(5)
                print('clicked.')
            except Exception as e:
                print(e)
                break
        articles_elements = self.driver.find_elements_by_xpath('.//div[@class="pure-u-1-1 pure-u-md-1-4 smallItem"]/a')
        articles_url = [element.get_attribute("href") for element in articles_elements]
        print(articles_url, response.url)

Unfortunately, as a result I only get urls of articles that are already in the source of the page. Can someone suggest me what I'm doing wrong?


Viewing all articles
Browse latest Browse all 97781

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>