BeautifulSoup How do you get the parent element tag that contains a specific text? Trying to scrape email but unable to pick up the parent element tag

November 28, 2020, 4:50 pm

≫ Next: How to extract data from a column in excel row for row? (Python - Selenium/xlrd/Pandas)

≪ Previous: Error while running pom.xml in Selenium project

I'm trying to scrape email addresses from a page and having some trouble getting the parent element that contains the email '@' symbol. The emails are embedded within different element tags so I'm unable to just pick them out. There's about 50,000 or so pages that I have to go through.

url = 'https://sec.report/Document/0001078782-20-000134/#f10k123119_ex10z22.htm'

Here are some examples (couple are from different pages I have to scrape):

<div style="border-bottom:1px solid #000000">**dbrenner@umich.edu**</div><div class="f3c-8"><u**>Bob@LifeSciAdvisors.com**</u></div><p style="margin-bottom:0pt;margin-top:0pt;;text-indent:0pt;;font-family:Arial;font-size:11pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">Email: **dmoskowitz@biocept.com**; Phone: 858-320-8244</p><td class="f8c-43">E-mail: <u>jcohen@2020gene.com</u></td><p class="f7c-4">Email: jcohen@2020gene.com</p>

What I have tried:

I tried find_all('div') to get the ResultSet of all the divs to get the ones that has '@' symbol in it.

div = page.find_all('div')for each in div:    if '@' in each.text:         print(each.text)

When I did this, due to the body being in a 'div', it printed the whole page. Fail.Since the emails are embedded within different tags, it seems inefficient for this method

Using Regular Expression. I tried using regular expression to pick out the emails but it gets bunch of texts that's not usable which I would have to manually split up, replace characters, etc. This just seemed a daunting task to go through all the different scenarios.

    import re    emails = re.findall('\S+@\S+', str(page))    for each in emails:        print(each)

Doing this gave me something like this :

hidden;}@media#000000">dbrenner@umich.edu</div>#000000">kherman@umich.edu#000000">spage@fredhutch.org</div>#000000">mtuck@umich.edu</div>#000000">jdahlgre@fredhutch.org</div></p>#000000">lafky.jacqueline@mayo.edu</div></p>mtuck@umich.edu)</div>#000000">ctsucontact@westat.com</div>.href="http://@umich.edu">@umich.edu</a></li><li><a

Now I can go in and split some of the texts using .split('<') and then split again, etc. but they're not all same and since I have to scrape 50,000+ pages with 100 entries in each page, there's a lot I have to scrape and take into consideration.

I tried looking on google and stackoverflow but all I can find are solutions where people are looking for the text within a certain element, etc.

What I need is 'How to find the parent element that contains an email' specifically

I don't think I would need to use Selenium for this since the issue would be similar to using Beautifulsoup and the site is not JavaScript rendered other than some of the pages being a pdf, which is whole another issue.

Any insight, help or advice is appreciated. Thanks.

↧

How to extract data from a column in excel row for row? (Python - Selenium/xlrd/Pandas)

November 28, 2020, 5:18 pm

≫ Next: Handling “Accept Cookies” popup with Selenium in Python

≪ Previous: BeautifulSoup How do you get the parent element tag that contains a specific text? Trying to scrape email but unable to pick up the parent element tag

I am trying to extract data from a column in an excel file and I want to be able to take the data from each row and input it into a field on my companies website.My excel spreadsheet looks something like this..

[Column A] [Column B]   Zach       1111   Chris      2222   Jake       3333

With the code I currently have, it will take the data from the columns and store them in names and codes. My issue is that when it starts to input the information in the website, it still uses the information from the last cell in the range. For example, the first loop will input Zach 1111 but then the second loop it will input ZachChris 11112222 when I only need it to input Chris 2222 and so forth. How do I go about fixing this so it will input the information row for row instead of keeping all the previous rows?

Here is my code, I've left out information that does not apply to the problem.

from selenium import webdriverfrom selenium.webdriver.common.keys import Keysimport pandas as pdimport xlrdimport timedriver = webdriver.Firefox()def addCodes():    global driver    path = 'test.xlsx'    workbook = xlrd.open_workbook(path)    sheet = workbook.sheet_by_index(0)    names = []    codes = []    for y in range(sheet.nrows):        names.append(str(sheet.cell_value(y, 0)))        codes.append(str(sheet.cell_value(y, 1)))        print(names)        print(codes)        codeadd = driver.find_element_by_xpath('/html/body/div/div/div/main/div/form/div[2]/div[5]/p/input')        nameadd = driver.find_element_by_xpath('/html/body/div/div/div/main/div/form/div[2]/div[6]/p/input')        codeadd.clear()        nameadd.clear()        codeadd.send_keys(codes)         nameadd.send_keys(names)        driver.find_element_by_xpath('/html/body/div/div/div/main/div/form/div[2]/input').send_keys(Keys.SHIFT,Keys.ENTER)        .... more unnecessary code...

↧

Handling “Accept Cookies” popup with Selenium in Python

November 28, 2020, 5:41 pm

≫ Next: How to automate scroll in dailog box using selenium java

≪ Previous: How to extract data from a column in excel row for row? (Python - Selenium/xlrd/Pandas)

1Been trying to scrape some information of this website with selenium. However when I access the websiteit need to accept cookies to continue. and it seems it is Java script.

Does anyone know how to get around this? or which topic should i Focus on?

from selenium import webdriverbrowser = webdriver.Chrome()browser.get("https://www.giffgaff.com")

enter image description here

↧

How to automate scroll in dailog box using selenium java

November 28, 2020, 5:46 pm

≫ Next: Scraping multiple Web Pages at once with selenium

≪ Previous: Handling “Accept Cookies” popup with Selenium in Python

I am trying to automate Instagram website, when I click followers link it opens a dialog box with users to follow. When I try to scroll that dialog box it is scrolling the main page but not the followers dialog box. How to achieve this?

I tried with the code

WebElement element = driver.findElement(By.cssSelector("div.pbNvD.fPMEg.HYpXt"));JavascriptExecutor js = (JavascriptExecutor) driver;js.executeScript("window.scrollBy(0,1000)",element);

↧

Scraping multiple Web Pages at once with selenium

November 28, 2020, 7:34 pm

≫ Next: How to read config file in vs2017

≪ Previous: How to automate scroll in dailog box using selenium java

I am using selenium and python to do a big project, I have to go to 320000 webpages one by one and scrape details from each one and then sleep for a second and move on. As follows:

links = ["https://www.thissite.com/page=1","https://www.thissite.com/page=2","https://www.thissite.com/page=3"]for i in links:    browser.get(i)    scrapedinfo = browser.find_elements_by_xpath("*//div/productprice").text    open("file.csv","a+").write(scrapedinfo)    time.sleep(1)

The Greatest problem with this is that it will take days or maybe weeks. Is there a way to increase speed, such as by: visiting multiple links at the same time and scraping all at once!

I have spent hours finding answers on google and stackoverflow but found nothing except for:multiprocessing. But, I am unable to use it in my script. Please help me.Thanks in advance

↧

How to read config file in vs2017

November 28, 2020, 7:48 pm

≫ Next: How do I download a file using java Selenium WebDriver?

≪ Previous: Scraping multiple Web Pages at once with selenium

I am keep getting error message " SYstem.ArgumentNullException:'Argument'url' cannot be null. Parameter name:'value'. I am using Visual Studio 2017 with Selenium.

using OpenQA.Selenium;using OpenQA.Selenium.Chrome;using OpenQA.Selenium.Firefox;using OpenQA.Selenium.IE;using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Threading.Tasks;using System.Configuration;namespace onlineShoesStore_2017{    class Program    {        static void Main(string[] args)        {            // IWebDriver driver = new FirefoxDriver();            // IWebDriver driver = new InternetExplorerDriver();            IWebDriver driver = new ChromeDriver();            driver.Navigate().GoToUrl(ConfigurationManager.AppSettings["URL"]);        }    }}My configuration file:<?xml version="1.0" encoding="utf-8" ?><configuration><appSettings><add key="URL" value="https://anupdamoda.github.io/AceOnlineShoePortal/SignIn.html/"/>    </appSettings></configuration>

↧

How do I download a file using java Selenium WebDriver?

November 28, 2020, 10:00 pm

≫ Next: selenium WebDriver UrlChecker$TimeoutException, opens the browser and then never navigating to the link

≪ Previous: How to read config file in vs2017

I'm using Selenium 2.21.0 with Java 6. How do I use the Selenium WebDriver API to download a file on a web page? That is, there is a link that causes the download of an Excel file to start. I would like to know how to initiate that download, determine when its finished, and then figure out where the file got downloaded to on my local system.

↧

selenium WebDriver UrlChecker$TimeoutException, opens the browser and then never navigating to the link

November 28, 2020, 10:54 pm

≫ Next: unable to determine app activity using "dumpsys window windows | grep -E 'mCurrentFocus'" shell code

≪ Previous: How do I download a file using java Selenium WebDriver?

i started learning Selenium on java the other day but i can't reach the link that i want, i'm using Opera the program starts by opening the opera driver then throwing an exception after few seconds and never navigating to the website that i want, i want to go to that web site and click the register button

'''public static void main(String[] args) {System.setProperty("webdriver.opera.driver", "C:/Users/LENOVO/AppData/Local/Programs/Opera/launcher.exe");

    WebDriver webDriver = new OperaDriver();    webDriver.get("https://nemexia.2axion.com/?s=horus");    try {        webDriver.findElement(By.id("btn-register")).click();    } catch (Exception e) {        e.printStackTrace();    }}'''

'''at org.openqa.selenium.remote.service.DriverService.waitUntilAvailable(DriverService.java:202)at org.openqa.selenium.remote.service.DriverService.start(DriverService.java:188)at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:79)at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213)at org.openqa.selenium.remote.RemoteWebDriver.(RemoteWebDriver.java:131)at org.openqa.selenium.opera.OperaDriver.(OperaDriver.java:173)at org.openqa.selenium.opera.OperaDriver.(OperaDriver.java:160)at org.openqa.selenium.opera.OperaDriver.(OperaDriver.java:115)at NormalClass.main(NormalClass.java:9)Caused by: org.openqa.selenium.net.UrlChecker$TimeoutException: Timed out waiting for [http://localhost:23877/status] to be available after 20004 msat org.openqa.selenium.net.UrlChecker.waitUntilAvailable(UrlChecker.java:100)at org.openqa.selenium.remote.service.DriverService.waitUntilAvailable(DriverService.java:197)... 9 moreCaused by: java.util.concurrent.TimeoutExceptionat java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:156)at org.openqa.selenium.net.UrlChecker.waitUntilAvailable(UrlChecker.java:75)... 10 more'''

↧

unable to determine app activity using "dumpsys window windows | grep -E 'mCurrentFocus'" shell code

November 28, 2020, 11:04 pm

≫ Next: How To Send "Enter Key" via JavaScript in Selenium using "executeScript"

≪ Previous: selenium WebDriver UrlChecker$TimeoutException, opens the browser and then never navigating to the link

I can able to connect the devices. I have opened different application in the device . But I tried the following methods. But I couldn't able to get any app activity

'adb shell' followed by "dumpsys window windows | grep -E 'mCurrentFocus'" command
adb -s shell dumpsys window windows | grep -E 'mCurrentFocus | mFocusedApp'

Can someone help me what to do next. I tried with different mobile devices. When i tried to execute the code via eclipse the code is executed and can able to access the real device navigate through the app. Can some one please guide me

↧

How To Send "Enter Key" via JavaScript in Selenium using "executeScript"

November 28, 2020, 11:38 pm

≫ Next: AttributeError: 'list' object has no attribute 'click' - Selenium

≪ Previous: unable to determine app activity using "dumpsys window windows | grep -E 'mCurrentFocus'" shell code

I'm working on automating a flow using IE 11 with Selenium and Java, on this web page I need to put a value in Text Box and then press Enter. I'm able to put the values using below code -

// Here Box is a webElementJavascriptExecutor js = (JavascriptExecutor)iedriver; js.executeScript("arguments[0].value='1500';",box);

which is working as expected, but when I try to use box.sendKeys(Keys.Enter) it doesn't work. So what is the way I could achieve "pressing Enter key via JavaScript".

I have tried below code as well, but this is also not working.

Actions actions = new Actions(iedriver);actions.moveToElement(box).sendKeys(Keys.RETURN).build().perform();

There is no error message, code executes but on web page Enter Key is not pressed.

↧

AttributeError: 'list' object has no attribute 'click' - Selenium

November 28, 2020, 11:45 pm

≫ Next: Selenium Google Login Block

≪ Previous: How To Send "Enter Key" via JavaScript in Selenium using "executeScript"

hellow i need help this is my script

from selenium import webdriverurl = "https://sjc.cloudsigma.com/ui/4.0/login"d = webdriver.Chrome()d.get(url)escolhe = d.find_elements_by_xpath('//*[@id="trynow"]')escolhe.click()

and this is what the html

<button id="trynow" class="btn g-recaptcha block full-width m-b dwse btn-warning" ng-class="{'btn-danger': instantAccess=='Error', 'btn-success': instantAccess=='Success', 'btn-warning': instantAccess=='Working', 'btn-warning': (instantAccess!='Working'&& instantAccess!='Success'&& instantAccess!='Error')}" data-ng-disabled="instantAccess=='Working' || instantAccess=='Success' || instantAccess=='Error'" analytics-on="click" analytics-event="Guest logged in" analytics-category="Guest logged in" analytics-label="Guest logged in" data-sitekey="6Lcf-2MUAAAAAKG8gJ-MTkwwwVw1XGshqh8mRq25" data-callback="onTryNow" data-size="invisible"> Instant accessNo credit card is required Instant access... Entrar na sessão Erro </button>

I need help because whenever I put xpath this error

AttributeError: 'list' object has no attribute 'click'

↧

Selenium Google Login Block

November 28, 2020, 11:48 pm

≫ Next: Download extension from chrome store via selenium

≪ Previous: AttributeError: 'list' object has no attribute 'click' - Selenium

I have a problem with Google login. I want to login to my account but Google says that automation drivers are not allowed to log in.

I am looking for a solution. Is it possible to get a cookie of normal Firefox/Chrome and load it into the ChromeDriver/GeckoDriver? I thought that this can be a solution. But I am not sure is it possible or not..

Looking for solutions..

Also, I want to add a quick solution for this. I solved this issue by using one of my old verified account. That can be a quick solution for you also.

↧

Download extension from chrome store via selenium

November 29, 2020, 1:03 am

≫ Next: How can i fix selenium wrong permission error?

≪ Previous: Selenium Google Login Block

I want to download chrome extension from chrome store via selenium all goes well except the popup that asks to confirm the download the extension.I have tried to accept the popup but it didn't worked, I'm adding the full code, only missing the part to accept the popup.

here my code:

    static void WebStoreDownloaad2(string webdriverDirectory)    {        WebDriverWait wait;        ChromeOptions options = new ChromeOptions();             options.PageLoadStrategy = PageLoadStrategy.None;        options.AddArgument("no-sandbox");        options.SetLoggingPreference(LogType.Driver, LogLevel.All);        IWebDriver driver = new ChromeDriver(webdriverDirectory, options, TimeSpan.FromMinutes(1));        driver.Navigate().GoToUrl("https://chrome.google.com/webstore/category/extensions");        wait = new WebDriverWait(driver, TimeSpan.FromMinutes(1));        wait.Until(condition => {            try            {                IWebElement serverTextBox = driver.FindElement(By.Id("searchbox-input"));                serverTextBox.Clear();                serverTextBox.SendKeys("malwarebyte");                serverTextBox.SendKeys(Keys.Enter);                return true;            }            catch (NoSuchElementException)            {                return false;            }            catch (ElementNotInteractableException)            {                return false;            }        });        wait = new WebDriverWait(driver, TimeSpan.FromMinutes(1));        wait.Until(condition => {            try            {                IWebElement toClick = driver.FindElement(By.XPath("//*[contains(text(), 'Malwarebytes Browser Guard')]"));                toClick.Click();                return true;            }            catch (NoSuchElementException)            {                return false;            }catch(ElementNotInteractableException)            {                return false;            }        });        Thread.Sleep(3000);        wait = new WebDriverWait(driver, TimeSpan.FromMinutes(1));        wait.Until(condition => {            try            {                IWebElement toClick = driver.FindElement(By.ClassName("g-c-R"));                toClick.Click();                return true;            }            catch (NoSuchElementException)            {                return false;            }            catch (ElementNotInteractableException)            {                return false;            }        });    }

and here is the popup:

I have tried like:

driver.SwitchTo().Alert().Accept();

but it didn't worked, how to accept the popup?

↧

How can i fix selenium wrong permission error?

November 29, 2020, 1:07 am

≫ Next: C# WindowHandles works better than Java GetWindowHandles

≪ Previous: Download extension from chrome store via selenium

I downloaded selenium and chrome driver but when i run

import timefrom selenium import webdriverdriver = webdriver.Chrome(executable_path="/Users/NahuApple/webDriver")driver.get('http://www.google.com/');time.sleep(5)search_box = driver.find_element_by_name('q')search_box.send_keys('ChromeDriver')search_box.submit()time.sleep(5)driver.quit()

I get this error

selenium.common.exceptions.WebDriverException: Message: 'webDriver' executable may have wrong permissions?

how can I fix this?

Thanks in advance for the help.

↧

C# WindowHandles works better than Java GetWindowHandles

November 29, 2020, 1:25 am

≫ Next: Python + Selenium: Page object pattern: Need help to figure out why it doesn't see an attribute

≪ Previous: How can i fix selenium wrong permission error?

Ok so i continue to convert my C# framework to Java

In C# , Driver.WindowHandles returns a count of 2, which is correct, so i tell it to select 'last' and it works great.

In Java the Driver.GetWindowHandles returns a count of only 1, which messes up my script as i need to access the 2nd window

Can anyone do a method in Java that mimics the way the c# method does it (as it seems to work better than whatever the Java equivalent does)

Thanks!

↧

Python + Selenium: Page object pattern: Need help to figure out why it doesn't see an attribute

November 29, 2020, 1:49 am

≫ Next: Why is my Eclipse IDE showing a launch error?

≪ Previous: C# WindowHandles works better than Java GetWindowHandles

I'm a Python + Selenium newbie. Today started learning about the Page Object Pattern and faced this error:

Traceback (most recent call last):File "logintest.py", line 17, in test_loginlogin_page.login()File "D:\Tests\page object pattern\loginpage.py", line 19, in loginBasePage.fill_the_form(loginform, email)NameError: name 'loginform' is not defined

I don't know why it throws it, because loginform is defined.

Base page

from abc import abstractmethodfrom selenium import webdriverclass BasePage(object):     def __init__(self, driver):     self.driver = driver     @abstractmethod     def _validate_page(self, driver):         return     def fill_the_form(self, locator, value):         form = self.driver.find_element(locator)         self.form.clear()         self.form.send_keys(value)    def go_to(self, locator):         self.driver.find_element(locator)         self.driver.click()

Login page

from base import BasePagefrom homepage import HomePagefrom selenium.webdriver.common.by import Byclass LoginPage(BasePage):     email = 'test@test.net'     password = 'test'     loginform = (By.ID, 'email')     passwordform = (By.ID, 'password')     button = (By.XPATH, "//*[contains(text(), 'Log In')]")     def __init__(self, driver):         super(LoginPage, self).__init__(driver)     def _validate_page(self, driver):         assert 'Login' in driver.title()     def login(self):         BasePage.fill_the_form(loginform, email)         BasePage.fill_the_form(passwordform, password)         BasePage.go_to(button)

Base test

 from selenium import webdriver import unittest class BaseTest(unittest.TestCase):     def setUp(self):         self.driver = webdriver.Firefox()         self.driver.implicitly_wait(10)         self.driver.get('https://testtest.com/')     def tearDown(self):         self.driver.quit()

And finally my test:

 import unittest from basetest import BaseTest from base import BasePage from loginpage import LoginPage class logintest(BaseTest):      def test_login(self):           login_page = LoginPage(self.driver)           login_page.login() if __name__ == '__main__':     unittest.main(verbosity = 2)

↧

Why is my Eclipse IDE showing a launch error?

November 29, 2020, 3:30 am

≫ Next: No element found using locator error after switching to iframe using protractor

≪ Previous: Python + Selenium: Page object pattern: Need help to figure out why it doesn't see an attribute

I had to recently change my laptop and now it is showing launch error whwn I try to execute something.

enter image description here

↧

No element found using locator error after switching to iframe using protractor

November 29, 2020, 3:42 am

≫ Next: using python selenium with edge could not use adobe flash player

≪ Previous: Why is my Eclipse IDE showing a launch error?

I have been trying to download embedded PDF from webpage using protractor selenium. I'm currently stuck on having to actually download the file since I always got the following error:

Failed: No element found using locator: By(css selector, *[id="download"])

It cannot find the button even after switching to frame.

I have also tried the approach indicated in the answer here where it extracts the src attribute value and go directly to the URL but same issue. The download button (icon) cannot be found.

We have the same exact requirements where we just need to click the download icon embedded in the PDF which happens to be inside an iframe. Example page like this.

Here is my code snippet.

        const iframe = $('#printFrame'),                          downloadBtn = $('#download'),              content = $('#content');        await this.disableWaitForAngular();     await browser.wait(EC.visibilityOf(iframe),waitTimeout);     console.log("Switching to iframe...");     await browser.switchTo().frame(iframe.getWebElement());     await browser.wait(EC.visibilityOf(content), waitTimeout);     await browser.actions().mouseMove(content).perform();     console.log("Waiting for download button.");     await browser.wait(EC.visibilityOf(downloadBtn), waitTimeout);     await downloadBtn.click();     await browser.switchTo().defaultContent();     await this.enableWaitForAngular();

UPDATE:

Tried to inject the following code as suggested on one of the proposed answers before and after switching frames but it gives me an error.

const downloadIcon: WebElement = await browser.executeScript('return document.querySelector("#viewer").shadowRoot.querySelector("#toolbar").shadowRoot.querySelector("#downloads").shadowRoot.querySelector("#download").shadowRoot.querySelector("#icon > iron-icon");');    await downloadIcon.click();

Error:

 - Failed: javascript error: Cannot read property 'shadowRoot' of null(Session info: chrome=87.0.4280.66)(Driver info: chromedriver=87.0.4280.20 (c99e81631faa0b2a448e658c0dbd8311fb04ddbd-refs/branch-heads/4280@{#355}),platform=Windows NT 10.0.14393 x86_64)

Download icon for reference:

↧

using python selenium with edge could not use adobe flash player

November 29, 2020, 4:32 am

≫ Next: Selenium WebDriver get text from input field

≪ Previous: No element found using locator error after switching to iframe using protractor

I have problem about using python selenium to automatically control Edge.I can't make webdriver allow adobe flash player running.

    options = EdgeOptions()    options.use_chromium = True    prefs = {"profile.default_content_setting_values.plugins": 1,"profile.content_settings.plugin_whitelist.adobe-flash-player": 1,"profile.content_settings.exceptions.plugins.*,*.per_resource.adobe-flash-player": 1,"PluginsAllowedForUrls": "http://test.alltobid.com/moni/gerenlogin.html",    }    options.add_experimental_option('prefs', prefs)    WebDriver = Edge(executable_path='.\\msedgedriver.exe',options=options)

↧

Selenium WebDriver get text from input field

November 29, 2020, 4:36 am

≫ Next: Can't get text with Selenium

≪ Previous: using python selenium with edge could not use adobe flash player

I'm writing a test to assert the default text value within an <input> tag. However, it's not playing ball:

Assert.assertThat(webDriver.findElement(By.id("inputTag")).getText(), Matchers.is("2"));

↧

Base page

Login page

Base test

Latest Images