I am trying to scrapping data (departure times, carrier, price etc.) from a train searching platform (https://www.thetrainline.com)
and i get a problem for extracting the names of attributes. The HTML for all connection looks like follows and i want to get a list of all carriers, ie. i want to get from the attribute "data-test-carrier-neme
" the corresponding carrier, here "trenitalia
".
div class="_1moixrt _dtnn7w" tabindex="0"span data-test-carrier-name="trenitalia"
For example for the times i just gather the text of attributes by iteration (see Syntax) Now for the carrier i dont succeed to gather the attribute names. I only get the name of the carrier for the first iteration/first connection but not the following connections.
dep_times = driver.find_elements_by_xpath('//div[@class="_1rxwtew "]')
dep_times_list = [x.text for x in dep_times]
#First Approach: I get the attribute name but only for the first connection
carrier1 = driver.find_elements_by_xpath('(//div[@class="_1moixrt _dtnn7w"])[1]/span[1]')
carrier1_list = [x.get_attribute("data-test-carrier-name") for x in carrier1]
Output: ['trenitalia']
#Second Approach: I access the attributes of all connection but without getting the name of the attribute:
carrier1 = driver.find_elements_by_xpath('(//div[@class="_1moixrt _dtnn7w"])[1]/span[1]')
carrier1_list = [x.get_attribute("data-test-carrier-name") for x in carrier1]
Output: [None, None, None, None, None, None]
Can some one adjust my code in order to solve my issue? Thank you very much for helping!!