Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 98825

Javascript generated content detection using BeautifulSoup and Selenium

$
0
0

I'm trying to get all the books regarding computer science from Pearson's website (starting from this url: https://www.pearson.com/us/higher-education/professional---career/computer-science/computer-science.html) but the list of books in each category is generated via javascript.

I've tried to use Selenium to get the page open and then parse it using BeautifulSoup. After I open a category page I can't find the tag that contains all the info about a book.

from selenium.webdriver.support import expected_conditions as ec
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup

driver = webdriver.Safari()
driver.get('https://www.pearson.com/us/higher-education/professional---career/computer-science/computer-science.html')
wait = WebDriverWait(driver, 2)
content = driver.page_source
soup = BeautifulSoup(content)

#first I loop through categories
categories = list(driver.find_elements_by_xpath('//ul[@class="category-child-list-level-2"]//a'))
for i in range(len(categories)):
    print('CATEGORY : {}/170'.format(i+1))
    categories[i].click()
    while next_page_link != None:
    WebDriverWait(driver, 10).until(ec.visibility_of_element_located((By.CLASS_NAME, "content-tile-book-box")))
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    print(soup.findAll('li', attrs={'class':'content-tile-book-box visible'})) #it results always empty
    for a in soup.findAll('li', attrs={'class':'content-tile-book-box visible'}):
        #I would like to have access to the books' links
        book_title_link = a.find_element_by_xpath('/div[@class="wrap-list-block"]//a')
    #loop through all the book pages of the current category
    next_page_link = driver.find_element_by_xpath('//a[@aria-label="Next"]')
    next_page_link.click()

Hope you can help me, thank you!


Viewing all articles
Browse latest Browse all 98825

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>