Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 98819

Scrape Data after Clicking Expand Button

$
0
0

I want to scrape information in the "Experience" Section on the LinkedIn Page. Here is an example website: https://www.linkedin.com/in/jeffweiner08/

Before Click "Show More" Button

After Click "Show More" Button

The second one is where I want to start collecting data.

As shown in the picture, I want to: 1. Check if there is a "Show * more experiences" button. 2. If so, click "Show More" Button first, then collect information. 3. If not, collect information directly.

for index, row in Test.iterrows():
    driver.get(row['Website'])
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    exp = soup.find('section', {'id':'experience-section'}) 
    temp = exp.find('button', {'class':'pv-profile-section__see-more-inline pv-profile-section__text-truncate-toggle link link-without-hover-state'})
    if temp:
        ShowMore_Button = driver.find_element_by_xpath('/html/body/div[5]/div[4]/div[3]/div/div/div/div/div[2]/main/div[2]/div[6]/span/div/section/div[1]/section/div/button') 
        ShowMore_Button.click()
        employer_names = exp.findAll('p', {'class':'pv-entity__secondary-title t-14 t-black t-normal'})
        employer_names_final = []
        for e in employer_names:
            employer_names_final.append(e.get_text().strip())
        print(employer_names_final)
        date_names = exp.findAll('h4', {'class':'pv-entity__date-range t-14 t-black--light t-normal'})
        date_names_final = []
        for d in date_names:
            date_names_final.append(d.get_text().strip())
        print(date_names_final)
        position_names = exp.findAll('h3', {'class':'t-16 t-black t-bold'})
        position_names_final = []
        for p in position_names:
            position_names_final.append(p.get_text())
        print(position_names_final)
    else: 
        employer_names = exp.findAll('p', {'class':'pv-entity__secondary-title t-14 t-black t-normal'})
        employer_names_final = []
        for e in employer_names:
            employer_names_final.append(e.get_text().strip())

        print(employer_names_final)
        date_names = exp.findAll('h4', {'class':'pv-entity__date-range t-14 t-black--light t-normal'})
        date_names_final = []
        for d in date_names:
            date_names_final.append(d.get_text().strip())
        print(date_names_final)
        position_names = exp.findAll('h3', {'class':'t-16 t-black t-bold'})
        position_names_final = []
        for p in position_names:
            position_names_final.append(p.get_text())
        print(position_names_final)

"Test" is a dataframe with LinkedIn URLs. "driver" here I use selenium Chrome driver.

And here is the result I get, it collect information without clicking "Show More" button:

['LinkedIn', 'Next Play Ventures', 'Concrete Rose Capital', 'Intuit', 'DonorsChoose'] ['Dates Employed\nDec 2008 – Present', 'Dates Employed\n2014 – Present', 'Dates Employed\nOct 2019 – Present', 'Dates Employed\nApr 2012 – Present', 'Dates Employed\n2007 – Present'] ['CEO', 'Co-Founder', 'Founding LP, Investment Committee', 'Member, Board of Directors', 'Member, Board of Directors']

How should I modify the code to collect data after clicking the expand button? Thank you.


Viewing all articles
Browse latest Browse all 98819

Trending Articles