Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "airflow"
-
Who was the shithead cumface dumbass idiot thinking that notebooks should have their cooling units at the bottom. How the fuck are they supposed to get air when sitting on my lap?10
-
To all the data engineers in here: WTF is going on in your field?
I've worked closely with a dozen data engineers in the last 5 years (and talked to friends and internet strangers about this and get similiar responses), mine if them seem to know how to use a computer!
They don't understand git, ORMs, best practices, how to use a terminal, DAGs (important for using modern ETL scheduling tools like airflow and prefext), etc
Guys with 10 years of experience on their resume and they can't wrap a model into a flask app with 1 endpoint. They'll reference local files on their machine in w jupyter notebook and are shocked it won't work on other computers!17 -
My GPU blocks the airflow from the lower front intake fan to the CPU, so I wanted to have a fan in the 5.25" drive bay directly targeting the CPU.
While that bay fits a 140mm fan nicely, there was no mounting point. I ended up making four fan struts out of the metal covers for the 5.25" inserts, the ones that you wiggle out. Drilled holes into the case, a bit of foam above and below the fan to seal the larger gaps, and done.
The trick is ofc that the 5.25" case covers are meshed and hence act both as air intake and dust filter. The CPU runs a few K cooler under load.14 -
I bought a computer awhile back off Kijiji (Canada's craigslist) for a really good price. Today, decided I was going to upgrade the ram since I got a sick deal on some corsair vengence 8gb sticks online...
And just before installing it, I realize the fucker decided to use low profile RAM in his build for a reason: he (for some fucking reason) decided to route the airflow for the system by placing the cooling fan directly over the first 2 memory slots.
Guess who's 5 minute memory upgrade just turned into an hour of re-routing all the airflow in the PC and having to redo all the fan wiring.
I shouldn't complain, I mean I got this computer a couple years back for like $400, but still, wtf man...4 -
!rant
Rant from my previous work as a consultant Data Engineer (wish I had known this site back then).
During my stay at the place, we have a big client whose contact with us was an incompetent stressful fellow.
I single-handedly build a humongous automated data pipeline using Airflow. I am very proud of my baby as my first massive project and check it obsessively for every possible flaw, especially when writing down documentation for the poor soul that would take my place.
Luckily for me, everything is working as intended, until of course on my last day of work, shit hits the fan, and everything breaks down.
After a moment of initial panic: it was Thursday morning, we had a Machine Learning model to run over the weekend, predictions to make and reports to write and a very lovely next week deadline, I calm down.
"I won't be dealing with this shit anymore, starting from 18:00 PM and anyway Fear Is The Mind Killer."
Quite sure that it couldn't have been my code, I start looking at various logs when the culprit was clear. The B(ig) S(tupid) C(lient) changed the whole schema of the data he was feeding to us.
I call him: he has no idea of what was done to the data. Hell, at first he doesn't seem to remember what the deal with schema, data, and SQL is (the guy was supposed to be a big shot in the IT department). It turns out he hired one of our competitors to do his side of the collection pipeline. He tries to get mad at me, but everything he throws bounces back to him. I am calm yet ruthless pointing out how every major hiccup had been his fault and that I could quickly reach to his board of directors explaining why their Machine Learning model was late.
Result: he apologizes, extends our deadline, and I get a round of applause from other juniors who would have to deal with me had I failed.
Never am I happier to not work as an underpaid cannon fodder apprentice in a shitty consultant firm.
Luckily for me, everything is working as intended, until of course on my last day of work, shit hits the fan, and everything breaks down.
After a moment of initial panic: it was Thursday morning, we had a Machine Learning model to run over the weekend, predictions to make and reports to write and a very lovely next week deadline, I calm down.
"I won't be dealing with this shit anymore, starting from 18:00 PM and anyway Fear Is The Mind Killer."
Quite sure that it couldn't have been my code, I start looking at various logs when the culprit was clear. The B(ig) S(tupid) C(lient) changed the whole schema of the data he was feeding to us.
I call him: he has no idea of what was done to the data. Hell, at first he doesn't seem to remember what the deal with schema, data, and SQL is (the guy was supposed to be a big shot in the IT department). It turns out he hired one of our competitors to do his side of the collection pipeline. He tries to get mad at me, but everything he throws bounces back to him. I am calm yet ruthless pointing out how every major hiccup had been his fault and that I could quickly reach to his board of directors explaining why their Machine Learning model was late.
Result: he apologizes, extends our deadline, and I get a round of applause from other juniors who would have to deal with me had I failed.
Never am I happier to not work as an underpaid cannon fodder apprentice in a shitty consultant firm. -
Airflow… airflow… I hate you so freaking much, you are a bloated piece of software that packages wayyy too much and increase the complexity of any solution we built on top of you. Plus unit-testing and integration tests are wayyy too difficult with you. But man… your recent UI changes are a massive welcome.
A year ago I was tasked with either upgrading airflow to version 2 or to migrate the code to another tool. Naively I thought “well might as well upgrade to avoid a rewrite”. Little did I know that the reason airflow didn’t scale out well for us, was due to people over the years not having a grasp on airflow primitives like “pools”, “workers” and “operators”. Ended up refactoring the entire codebase for both infra and DAGs anyhow AND upgrading the beast AND lower cost by a factor of 2 (from $100-$150 daily to $50-$70 daily).
But seriously feels like I could’ve solved the scheduling issue with literally any message queue+decent library (like celery or Faust) and I’d have half the headache.3 -
I started working with Luigi, the workflow manager for a data-science project. Any input on how to test the Tasks or how does it compare to Airflow?
-
Making distributed scheduler that queue and run tasks on containers or other executors in future and also pulls new tasks from defined git repositories.
Tasks are added based on simple yaml configuration.
Need that for my side projects that gather data from multiple sources from time to time.
k8s looks to heavy for that and airflow can’t be configured like I want it to be so I started writing my own on Monday.
Nearly finished poc version.2 -
I really want to switch my career from being a Full-Stack python/javascript developer to be a Data Engineer.
I've already worked with relational and non-relational databases, troubleshooted a couple of Airflow DAGs, deployed production-ready python code but now I feel kinda lost, every course I start on the Data engineering topic feels really useless since I feel like I've already worked with that technology/library, but I'm still afraid of start taking interviews.
Any good book/course or resource that I should look in?
BTW first rant in a couple of years, this brings me memories1 -
Working on a feature which heavies relies on a data pipeline. I noticed it is a couple of lambda functions calling each other ( Fuck you to the guy who made it). The best way to get sanity back is build a proper etl pipeline. Any suggestions for building a etl in python with reliability.
Options already considered
1. Celery tasks - Worked well but no overview of the single task progress across celery tasks
2. Airflow - Gives good overview but the docs make less sense than a 10 yr talking. Mostly because they introduced a new syntax and not everything has migrated fully yet. Also no support for reusing dags2 -
Does someone know how to set up plugins for airflow?
In particular do you know how I set the PYTHONPATH env var for the docker container to Import my custom airflow plugin?