Web scraping tool

Soldato
Joined
27 Dec 2005
Posts
17,036
Location
Bristol
I need to scrape a couple of webpages for data that's publicly accessible (tho it's behind a login, if that matters).

Both are just a list of profiles. The main page has a list of everyone's names, and their subsequent/individual page has further details they've opted to share. So I need a scraper that will follow each link and export the data to a spreadsheet. I've tried using a couple but they don't have the "follow link" functionality and so they only scrape the names.

Any recommendations? I can write PHP and can run something locally if that helps.
 
Man of Honour
Joined
19 Oct 2002
Posts
27,779
Location
Surrey
I would probably start by taking a look at Python with one of the web scrapin packages such as Beautiful Soup.
 
Soldato
Joined
19 Oct 2008
Posts
5,733
A fiver? Where are these people? Maybe I can outsource all my freelance work to them.

Another option would be any popular language combined with selenium. It will drive a browser to do it but think can use it in headless mode. Easy to use, can use browser F12 tools to help find elements by class name or id etc
 
Associate
Joined
20 Nov 2016
Posts
729
A fiver? Where are these people? Maybe I can outsource all my freelance work to them.

Another option would be any popular language combined with selenium. It will drive a browser to do it but think can use it in headless mode. Easy to use, can use browser F12 tools to help find elements by class name or id etc
Selenium automates web navigation, not parsing / scraping, unless the op is after embedded attachments to download.

Op as mentioned beautiful soup is the way forward, and even for a newbie is intuitive / Google as you build out what you need. Plenty of help on Stackoverflow
 
Top Bottom