Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 98800

Scrapping javascript website with Selenium where pages randomly fail to load across multiple browsers

$
0
0

I have a python scrapper with selenium for scrapping a dynamically loaded javascript website.
Scrapper by itself works ok but pages sometimes fail to load with 404 error.
Problem is that public http doesn't have data I need but loads everytime and javascript http with data I need sometimes won't load for a random time.
Even weirder is that same javascript http loads in one browser but not in another and vice versa.
I tried webdriver for chrome, firefox, firefox developer edition and opera. Not a single one loads all pages every time.
Public link that doesn't have data I need looks like this: <https://www.sazka.cz/kurzove-sazky/fotbal/*League*/>.
Javascript link that have data I need looks like this <https://rsb.sazka.cz/fotbal/*League*/>.
On average from around 30 links, about 8 fail to load although in different browsers that same link at the same time loads flawlessly.
I tried to search in page source for some clues but I found nothing.
Can anyone help me find out where might be a problem? Thank you.

Edit: here is my code that i think is relevant

driver = webdriver.Chrome(executable_path='chromedriver', 
                               service_args=['--ssl-protocol=any', 
                               '--ignore-ssl-errors=true'])
driver.maximize_window()
for single_url in urls:   
    randomLoadTime = random.randint(400, 600)/100
    time.sleep(randomLoadTime)
    driver1 = driver
    driver1.get(single_url)  
    htmlSourceRedirectCheck = driver1.page_source

    # Redirect Check
    redirectCheck = re.findall('404 - Page not found', htmlSourceRedirectCheck)

    if '404 - Page not found' in redirectCheck:
        leaguer1 = single_url
        leagueFinal = re.findall('fotbal/(.*?)/', leaguer1)
        print(str(leagueFinal) + '' + '404 - Page not found')
        pass

    else:
        try:
            loadedOddsCheck = WebDriverWait(driver1, 25)
            loadedOddsCheck.until(EC.element_to_be_clickable \
            ((By.XPATH, ".//h3[contains(@data-params, 'hideShowEvents')]")))
        except TimeoutException:
                pass

        unloadedOdds = driver1.find_elements_by_xpath \
        (".//h3[contains(@data-params, 'loadExpandEvents')]")
        for clicking in unloadedOdds:
            clicking.click()
            randomLoadTime2 = random.randint(50, 100)/100
            time.sleep(randomLoadTime2)

        matchArr = []
        leaguer = single_url

        htmlSourceOrig = driver1.page_source

Viewing all articles
Browse latest Browse all 98800


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>