Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 97773

Python > Selenium: Web-scraping in a "logged-in" environment based on links from a text file

$
0
0

Compatible for ChromeDriver

This program seeks to accomplish the following:

  1. Automatically sign-in to a website;
  2. Visit a link / link(s) from a text file;
  3. To scrape data from each page visited this way; and
  4. Output all scraped data by print().

Kindly skip to Part 2 for the problem area, as part 1 is tested to work for step 1 already. :)

The code:

Part 1

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()

driver.get("https://www.website1.com/home")

main_page = driver.current_window_handle 
time.sleep(5) 

##cookies
driver.find_element_by_xpath('//*[@id="CybotCookiebotDialogBodyButtonAccept"]').click() 
time.sleep(5)

driver.find_element_by_xpath('//*[@id ="google-login"]/span').click() 
for handle in driver.window_handles: 
    if handle != main_page: 
        login_page = handle 

driver.switch_to.window(login_page) 

with open('logindetails.txt', 'r') as file:
   for details in file:
        email, password = details.split(':')

        driver.find_element_by_xpath('//*[@id ="identifierId"]').send_keys(email) 
driver.find_element_by_xpath('//span[text()="Next"]').click()

time.sleep(5)
driver.find_element_by_xpath('//input[@type="password"]').send_keys(password) 

driver.find_element_by_xpath('//span[text()="Next"]').click() 
driver.switch_to.window(main_page) 
time.sleep(5)

Part 2

In alllinks.txt, we have the following websites:


• website1.com/otherpage/page1
• website1.com/otherpage/page2
• website1.com/otherpage/page3

with open('alllinks.txt', 'r') as directory:
    for items in directory:
    driver.get(items)
    time.sleep(2)
    elements = driver.find_elements_by_class_name('data-xl')
    for element in elements:
            print ([element])
    time.sleep(5)


driver.quit()

The outcome:

[Done] exited with code=0 in 53.463 seconds

... and zero output


The problem:

Location of the element has been verified, am suspecting that the windows have something to do with why the driver is not scraping.

All inputs are welcome and greatly appreciated. :)


Viewing all articles
Browse latest Browse all 97773

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>