Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "depths of the internet"
-
My current task involves processing the commoncrawl web archive, and it's like a box of junk you buy at a flea market. You find so much useless stuff, broken stuff, stuff that makes you question people...
My latest find makes me wonder what lies out there if what I found was in plain sight. I found tens of thousands of websites that look like someone used markov chains to generate pron ads. Those websites exist in 10+ languages, use the same url-scheme, read like a dyslexic camgirl reading alphabet soup and are hosted on the same three ip-adresses. There is no javascript involved and some pages link to a variety of twitter accounts.
I queried a few commoncrawl files and amassed 4GB of this spam. Every time I look at it it gets weirder. There is an italian article about malware in there too.
Here's a text sample:
"Not from her bedroom, she her stream view and meet new experience. In hd india, because swimsuit still laws exist no interaction or frigthened and."1