Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Froot75407yWhat are you using to scrape?
I've done it with Node.js + Request + Cheerio. Easy as pie -
I'm using cheerio. But it's not framework that's bothering me. It's the web scraping itself. I hate doing it.
-
github95487yI used Python request or urllib2 and beautifulsoup..
And you can good amount if you are in your clg enough for monthly as side pocket money.. lots of startup pay for scrapping. And once you are good in it, it's like minute DOM selector parse change and you earn the same amount but every iterative time, doing less effort.. and you can eventually end up making a more generic your own scrapping framework handling different types of websites... -
github95487yAnd once during an internship at a startup, I crawled the entire LinkedIn member and companies directory and stored it in my Local drive...
-
@beriba
Yay first time time I see anyone using perl here :)
Having done scraping in both python and perl, I'll take perl if I can choose.
It takes a third of the code and it runs 3 times faster -
zshh38487yI'm using Python with requests as far as it gets me. Sometimes I need to use Selenium with PhantomJS headless browser, if the website uses JS to dynamically update the HTML.
I hate web scraping.
undefined
web scraping fucking sucks