1
j0n4s
2y

Are there some good tools to analyze a big dataset of json files? I mean i could normalize the dataset into a SQL database but are there some secret weapons to make life simpler?

Comments
  • 0
    Is there a set json schema?
  • 0
  • 0
    @jonas-w it's already in the best form
  • 0
    @melezorus34 mhm what do you mean by set json schema?
  • 1
    There is `jq`
  • 0
    @hjk101 yeah i'm already querying this stuff with some jq oneliners and some python
  • 2
    Your statement is so vague...

    What does big mean? Mega, Giga, Tera, Exa?

    What does analyze mean?

    If it's just a one time operation like aggregation, python / multi threading and it should be fine.

    https://pypi.org/project/ijson/

    Iterative JSON is the way to go if it doesn't fit the memory.

    If it's more than one time parsing, one might think about creating an intermediate representation - depending on analysis, this might be e.g. one day representation of values, allowing the values to be easily summed up to various units of time (weekly / monthly / yearly etc).

    Newline delimited JSON can simplify things greatly, database might not be needed.
  • 1
    @IntrusionCM it is not *that* big, i think I will learn jq a bit more (i already wanted to do this a few times, maybe it is now time) and when that doesn't work out for me i'll look at ijson (thanks) as this seems better than anything else i tried and when that doesn't satisfy me i will probably do some SQL stuff
  • 0
Add Comment