
Are there some good tools to analyze a big dataset of json files? I mean i could normalize the dataset into a SQL database but are there some secret weapons to make life simpler?

  • 0
    Is there a set json schema?
  • 0
  • 0
    @jonas-w it's already in the best form
  • 0
    @melezorus34 mhm what do you mean by set json schema?
  • 1
    There is `jq`
  • 0
    @hjk101 yeah i'm already querying this stuff with some jq oneliners and some python
  • 2
    Your statement is so vague...

    What does big mean? Mega, Giga, Tera, Exa?

    What does analyze mean?

    If it's just a one time operation like aggregation, python / multi threading and it should be fine.


    Iterative JSON is the way to go if it doesn't fit the memory.

    If it's more than one time parsing, one might think about creating an intermediate representation - depending on analysis, this might be e.g. one day representation of values, allowing the values to be easily summed up to various units of time (weekly / monthly / yearly etc).

    Newline delimited JSON can simplify things greatly, database might not be needed.
  • 1
    @IntrusionCM it is not *that* big, i think I will learn jq a bit more (i already wanted to do this a few times, maybe it is now time) and when that doesn't work out for me i'll look at ijson (thanks) as this seems better than anything else i tried and when that doesn't satisfy me i will probably do some SQL stuff
  • 0
Add Comment