3
vs15
6y

Query S3 based data and output result into S3.

Project Type
Project idea
Summary

Query S3 based data and output result into S3.

Description
the data size of file is 50GB plus. and the output generated is also of same size. Query to be executed is somewhat a rule like add a column based on some conditions or filter a dataset. I tried to use AWS Athena however, got time out after 30 mins.
Tech Stack
AWS Athena
Current Team Size
1
URL
Comments
  • 0
    what the hell is s3
  • 0
  • 0
    @calmyourtities AWS service to store data similar to FTP but different concepts involved.
  • 0
    That's basically what Snowflake is doing.
  • 0
    @AndSoWeCode yes it is showing this.. but how to achieve this at my end :)
  • 1
    @varunsinghal well there's a reason why big companies do this with large teams of superstars.
    Sure, you can put your data in S3 and write a program that queries it the way you want. That will work.
    But if you want ease of use, scalability, data integrity, then there's a lot of work to be done.
  • 0
    @AndSoWeCode thanks :)
    can you give me a direction to this career path, i really want to explore or at least read about this.
  • 0
    @varunsinghal you need knowledge about data organization. Starting with the most obvious - RDBMS, and their scalability (sharding), as well as NoSQL databases like the usual MongoDB, CouchDB, then going to graph DBs, then going with distributed computing (Spark), distributed file systems (Hadoop), a bit of devOps (Docker, Kubernetes), serverless computing (Amazon Lambda).

    One needs to understand what ACID compliance is, why it's needed, where it's needed, where it can be trade off, who can benefit from losing ACID.
    An important skill is the ability to develop concurrent computing processes efficiently....

    Basically you need a ton of knowledge and experience, and even more time, to be able to do this well on your own.

    The people that develop these systems - I talked to them. They aren't your average hipster college graduate. These are people with doctorate degrees and decades of experience, and/or people who can compute the Fibonacci sequence using regular expressions.
  • 1
    Don't get me wrong, the idea you proposed is great! It has value, and such a product would benefit from a lot of attention.

    The problem is that implementing it, although completely feasible, is extremely hard, and will anyway cost you a lot of time and money, since you just won't be able to do it well on your own.
  • 0
    @AndSoWeCode i just needed a vision behind this...to understand what we are dealing with. thanks :)
Add Comment