Quantcast
Channel: Active questions tagged selenium - Stack Overflow
Viewing all articles
Browse latest Browse all 97771

Node.js scraping with chrome-remote-interface

$
0
0

I have been trying to scrape a website protected by Distil Networks, in which using selenium (with Python) would just always fail.

I did a few searches, and my conclusion is that the site can detect you are using Selenium by using some sort of javascript. I then took a loot at chrome-remote-interface, like it is the thing that I want, but then I got stuck.

What I would like to do is to automate following steps:

  1. Open a Chrome instance
  2. Navigate to a page
  3. Run some javascript
  4. Collect data and save to file
  5. Repeat steps 2 - 4

I know that I can open a instance of Chrome for debugging by:

google-chrome --remote-debugging-port=9222

And I can open a console on node by:

chrome-remote-interface -t 127.0.0.1 -p 9222 inspect -r

I can also run simple scripts like

Page.navigate({url:"https://google.com"})
Runtime.evaluate({expression:"1+1"})

But like I can't get the DOMs directly on Node.js as what I could do on the Chrome Developer Tools console. Basically what I want is run scripts on Node like what I could do on the Chrome Developer Tools console.

Also , there are not enough documentation on chrome-remote-interface for scraping. Is there any good links for that?


Viewing all articles
Browse latest Browse all 97771

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>