I'm building a web crawler using Python 3.8. What I want to do is transform the table below in a pandas dataframe using Selenium, pandas, bs4.
<table class="no_border_top" width="95%" align="center"><tbody><tr><th align="left" valign="top" rowspan="100" width="8%">1.2.12</th><th align="left" colspan="4">Outras cotas de Fundos de Investimento</th> </tr><tr><td width="40%"><b>Fundo</b></td><td width="22%"><b>CNPJ</b></td><td width="15%"><b>Quantidade</b></td> </tr><tr><td>Itaú Soberano RF Simples LP FICFI</td><td>06.175.696/0001-73</td><td>247.719,87</td><td>11.996.245,91</td> </tr><tr><td>Itaú TOP RF Referenciado DI FICFI</td><td>05.902.521/0001-58</td><td>77.085,90</td><td>372.686,27</td></tr></tbody></table>
Btw, this is the link im trying to scrap data:
The Problem
If you were able to open the link (and if that doesn't work out i'll edit this question posting some prints from the website html ), you'll see that the table im interested is one of many tables inside a html document inside the webpage html im scrapping info. All the tables inside this nested html have the same class, they are all <table class="no_border_top" width="95%" align="center">...</table>
.selenium
provides some tools to cath elements given the id, class, xpath of the element and etc. I've tried to grab this table using find_element_by_xpath()
( passing the xpath collected by copying the full XPath given by Chrome Dev Tools) and find_element_by_link_text
(). but none of those options worked out for me, this is the code im using:
def make_selenium_browser(): options = Options() options.headless = True options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') browser = Firefox(options=options) return browserurl = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'browser = make_selenium_browser()browser.get(url)#not workingdata = browser.find_element_by_xpath('/html/body/table[28]')#this is also not workingbrowser.find_element_by_partial_link_text('Outras cotas de Fundos de Investimento')
The error
File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 462, in find_element_by_partial_link_text return self.find_element(by=By.PARTIAL_LINK_TEXT, value=link_text) File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element return self.execute(Command.FIND_ELEMENT, { File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute self.error_handler.check_response(response) File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace)selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: Outras cotas de Fundos de Investimento
How can I make Selenium find the table I'm interested in?