I want to scrape information in the "Experience" Section on the LinkedIn Page. Here is an example website: https://www.linkedin.com/in/jeffweiner08/
Before Click "Show More" Button
After Click "Show More" Button
The second one is where I want to start collecting data.
As shown in the picture, I want to: 1. Check if there is a "Show * more experiences" button. 2. If so, click "Show More" Button first, then collect information. 3. If not, collect information directly.
for index, row in Test.iterrows():
driver.get(row['Website'])
soup = BeautifulSoup(driver.page_source, 'html.parser')
exp = soup.find('section', {'id':'experience-section'})
temp = exp.find('button', {'class':'pv-profile-section__see-more-inline pv-profile-section__text-truncate-toggle link link-without-hover-state'})
if temp:
ShowMore_Button = driver.find_element_by_xpath('/html/body/div[5]/div[4]/div[3]/div/div/div/div/div[2]/main/div[2]/div[6]/span/div/section/div[1]/section/div/button')
ShowMore_Button.click()
employer_names = exp.findAll('p', {'class':'pv-entity__secondary-title t-14 t-black t-normal'})
employer_names_final = []
for e in employer_names:
employer_names_final.append(e.get_text().strip())
print(employer_names_final)
date_names = exp.findAll('h4', {'class':'pv-entity__date-range t-14 t-black--light t-normal'})
date_names_final = []
for d in date_names:
date_names_final.append(d.get_text().strip())
print(date_names_final)
position_names = exp.findAll('h3', {'class':'t-16 t-black t-bold'})
position_names_final = []
for p in position_names:
position_names_final.append(p.get_text())
print(position_names_final)
else:
employer_names = exp.findAll('p', {'class':'pv-entity__secondary-title t-14 t-black t-normal'})
employer_names_final = []
for e in employer_names:
employer_names_final.append(e.get_text().strip())
print(employer_names_final)
date_names = exp.findAll('h4', {'class':'pv-entity__date-range t-14 t-black--light t-normal'})
date_names_final = []
for d in date_names:
date_names_final.append(d.get_text().strip())
print(date_names_final)
position_names = exp.findAll('h3', {'class':'t-16 t-black t-bold'})
position_names_final = []
for p in position_names:
position_names_final.append(p.get_text())
print(position_names_final)
"Test" is a dataframe with LinkedIn URLs. "driver" here I use selenium Chrome driver.
And here is the result I get, it collect information without clicking "Show More" button:
['LinkedIn', 'Next Play Ventures', 'Concrete Rose Capital', 'Intuit', 'DonorsChoose'] ['Dates Employed\nDec 2008 – Present', 'Dates Employed\n2014 – Present', 'Dates Employed\nOct 2019 – Present', 'Dates Employed\nApr 2012 – Present', 'Dates Employed\n2007 – Present'] ['CEO', 'Co-Founder', 'Founding LP, Investment Committee', 'Member, Board of Directors', 'Member, Board of Directors']
How should I modify the code to collect data after clicking the expand button? Thank you.