Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 99407

Scraping data from charts or tables created using Google Visualization tools using Python and Selenium

$
0
0

I am trying to scrape data from the tables and charts from this website: https://www.portfoliovisualizer.com/fund-performance?s=y&symbol=MUB&symbols=VTEB&benchmark=VWITX&startDate=1%2F1%2F2015&endDate=1%2F31%2F2020

The site contains both static tables, which I can parse w/o any problem using requests and BeautifulSoup, and some dynamically generated tables and charts that are created through the use of Google Vizualization tools. For the dynamically generated parts, I used the Selenium webdriver, and was able to get the page source, after the dynamic charts and tables are being generated, so I see the values I am looking to pull. However, I do not know how to pull them, as they show in the page source in segments like the one below:

Blockquote

<div id="chartDiv2" style="width: 900px; height: 500px;"></div>
    <script>
    function getChartData2() {
    var data2 = google.visualization.arrayToDataTable([['Year', 'Vanguard Tax-Exempt Bond ETF', 'iShares 
    National Muni Bond ETF', 'Vanguard Interm-Term Tx-Ex Inv'],['2015', 0.02622989191078684, 
    0.027694140180976934, 0.020218079771240793],['2016', 0.0017793103198122662, -0.0016512224730700353, 
    8.245768963E-4],['2017', 0.04691417159814648, 0.04723004573804612, 0.04534581696422735],['2018', 
    0.010468419206658197, 0.0092958557816043, 0.01252249494095059],['2019', 0.07344514685363279, 
    0.07055205700158873, 0.06781218369439523],['2020', 0.017556966753828895, 0.01659204635238365, 
    0.0158935038009671]]);
    var formatter2 = new google.visualization.NumberFormat({ pattern: '0.00%' });
    formatter2.format(data2, 1);
    formatter2.format(data2, 2);
    formatter2.format(data2, 3);
    var chart2 = new google.visualization.ColumnChart(document.getElementById('chartDiv2'));
    var options2 = { title: 'Annual Returns', legend: { textStyle: { fontSize: 13 } }, hAxis: { title: 
    'Year'}, vAxis: { dummy: false, title: 'Annual Return', format:'0.0%', minValue: 0}, focusTarget: 
    'category'};
    return [chart2, data2, options2];
    }
    </script>

Blockquote

Any search of the soup with find or find_all stumbles at the outermost table (which seems to encompass the entire webpage). I am quite new at Python, so any help would be really appreciated. Right now, the only way I can think of doing this is saving the entire page source as a text file (which I have done) then parsing it myself by searching for each instance of google vizualization tools being used and proceeding from there). That would be quite tedious, and probably not very robust when trying to use it to another link.

Many thanks in advance for any suggestion.


Viewing all articles
Browse latest Browse all 99407

Trending Articles