5
lorentz
2d

we should archive these posts, I'm gonna miss the choice bullshit when this forum finally gives out

Comments
  • 4
    I've this:

    sqlite> select count (0) from rants;

    53068

    sqlite> select count (0) from comments;

    504678

    sqlite>

    This is literally everything that is to crawl about devRant. Well, maybe that searching for single characters and stuff could still lead to something. To achieve this, i had to do 300.000+ requests. I did it using aiohttp and let it run a weekend in slow mode. I consider it not very social to attack it with a concurrency of 20.

    The rants+comments are around 158Mb well parsed in sqlite. I scraped rant data using C from all pages and insert into sqlite. Calculating how big the html is takes some time. It's around 13Gb i think.

    This has crawled to every profile, rant, and comment that is in any way connected but still does not cover the whole site. That means, there's a lot of unseen stuff behind the search functionality. Sadly, the search functionality sucks. I could make it really awesome with current tech.

    WHOOPS: 37G devrant. Half is cache.
  • 2
    @retoor you're a gem
  • 2
    naw I would rather be forgotten, just as nature intended
  • 1
    @jestdotty I won't, my code is in the arctic vault.
Add Comment