Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Kimmax109878yDo you have it sorted?
Scanned for buzzwords in the post, like YLYL?
And did you compared hashes and linked double posts?
Sounds like cool project to me. Many things to try out and optimize come into my mind -
thats pretty cool, search for a partnership with a dataspecialist and an digital artist.
-
donuts236728yGood luck viewing them all... I've had a scraper to for years though it's run on demand....
-
gacbl6218yNow just re upload everything you have back to 4chan and see if your scrapper will pick it up and download :D
-
Knowing their content i wouldn't even want to glance at that folder. There is at least a gig of some form of illegal or gruesome content. I would like to keep my sanity thanks.
-
biskus10718yYou are correct sir, ylyl was the first keyword I added. All media is saved with django, all posts are connected to a thread which has some attributes like title, date etc. I'm currently building a web interface to sort through the data. Comparing hashes to identify duplicates is a great idea, thanks. @Kimmax
-
biskus10718y@redstonetehnik if you can find a proper name for this project with an available domain name, I will give you admin access :)
-
dom3mo688yEverytime I hear about 4chan, it's some cool stuff. But when I go onto it..very disturbing content.
-
Kimmax109878y@dom3mo you just have to have the ability to filter what you see :D
But there are other boards too, like /wsg/ that disallows the crazy stuff
Turns out my 4chan image scraper has been running for 6 months without interruptions. I now have 106k pictures and webms of highly questionable content on my harddrive. This is how Oppenheimer must have felt.
undefined