Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "prod fail"
-
I just love it when our clients decide to make a clone of live production server..then put it immediately online..and don't tell anyone about this.. and then start bitchin how data gets doubled all of a sudden..
Yeah, no shit sherlock.. you have two prod servers for 'hot swapping' and some services may only be running on one at a time.. You even have a manual on how to switch primary to secondary (turn off services on primary first, then turn them on on secondary and all)..or in case primary actually dies, just turn on services on secondary and you're good to go, right?
So how do ya think cloning the one with running services and putting the clone immediately online will work out?! 🤔
God, I thought it was common sense to not do that..but here I am, bitchin about how people fail to RTFM.. :/ or use brain..fuck..4 -
Half way through a 2 hour deployment, and a fucking test fails.
Yes, it takes 2+hours to run the tests.
Rerun in test environment: pass
Rerun in prod: pass
Rerun changed test in prod: fail..
Why, why you got to hate me for?
I love it when production is the place where config gets changed.rant tests are good multi environment changes tests can go fuck off right now tests save lives inconsistencies hurt5 -
There is a company providing a very speciffic service. And it has a core application for that svc, supported by a core app team.
That company also has other services, which are derivations of the core one. So every svc depends on core.
Now that we're clear on that... I was working in a team of one of the subservices. We very strongly depended on core. In fact, our svc was useless if integration w/ core broke down.
The core team had an annoying habbit. They refused to version their webservices and they LOVED to push api updates w/o any warnings. Our prod, test, other envs used to fail bcz of core api changes quite often. Mgmt, IT head was aware of the problem and customers' complaints as well.
So as a result, once core api changes we're all in a panic mode: all prior priorities are lowered and revival of prod is to be our main focus. Core api is not docummented, the changes are not clear, so we have to reverse engineer the shit out of it. We manage to patch our prod up w/ hotfixes, but now we have tech debt. While working on the debt, core api changed again, in test env. Mgmt pushes debt back and reallocates us to hotfix test. Hotfix is 80% done when another core api breaks. Now mgmt asks us to drop wtv we're working on and fix that new break. By the time we're to deploy the hotfix, another api breaks in another env. The mgmt..... You get the picture :)
2 years go by, nothing has changed so far.6 -
Merry Christmas to all the dev ranters out there. My thoughts go out to those on call. May prod never fail and your phone never ring.
-
https://devops.com/dear-staging-wer...
What the F#*@&$# %#@$!?!?!
This person has decided to skip using staging, because it doesn't correctly reflect prod!
If that's your problem, than why don't you try to fix it? Create a DB with fake data, make one based on anonymised customer data, or even do it on non-anonymised data (with permission of course), but fix the staging env so that it reflects prod!
This is a devops site (it's literally the name!), and instead of teaching you how to make staging exactly like prod, they tell you to do what caused the creation of the staging->prod system IN THE FIRST PLACE!!!
There's all these stuff like Vagrant that are literally designed to help you as a dev mimic prod, and you just throw it all out!?
"With feature flags, I can safely test in production without fear of breaking something or negatively affecting the customer experience."
Famous last words.12 -
The mobile application my company is developing is beginning to fail in a prod environment because the third party tool we purchased to sync our 3 databases in the background isn't working as expected, so I have been assigned the task of rewriting the entire application. I chose to do it in react-native/redux which I have never heard of until two weeks ago, and I have never enjoyed programming so much in my life. Shit just clicks and works the first time more often than not. Android Studio had me banging my head against my desk daily. Kudos to these technologies 👌1
-
Finally made my node production server stable enough that I could focus on writing tests*. I start by setting up docker, mocking cognito, preparing the database and everything. Reading up on Node test suites and following a short tut to set up my first unit test. Didn't go smoothly, but it's local and there are no deadlines so who cares. 4 days later, first assert.equal(1+1, 2) passes and I'm happy.
I start writing all sorts of tests, installing everything required into "devDependancies," and getting the joy of having some tests pass on first try with all asserts set up, feels good!
I decide to make a small update to production, so I add a test, run and see it fail, implement the feature, re-run and, it passes!
I push the feature to develop, test it, and it works as intended. Merge that to master and subsequently to one of my ec2 production servers**, and lo and behold, production server is on a bootloop claiming it "Cannot find module `graphql`". But how? I didn't change any production dependencies, and my package lock json is committed so wth?
I google the issue, but can't find anything relevant. The only thing that I could guess was that some dependencies (including graphql) were referenced*** in both, prod and dev, and were omitted when installed on a prod NODE_ENV, but googling that specific issue yielded no results, and I would have thought npm would be clever enough to see that and would always install those dependencies (spoiler: it didn't for me).
With reduced production capacity (having one server down) I decided to npm uninstall all dev dependencies anyway and see what happens. Aaaaand it works.....
So now I have a working production server, but broken local tests, and I'm not sure why npm is behaving like this...
* Yes I see the irony.
** No staging because $$$, also this is a personal project.
*** I am not directly referencing the same thing twice, it's probably a subdependency somewhere.2 -
At Rackspace there are lights on the walls that go off for things like ddos attacks, fire alarm, etc. The being a code rainbow. Meaning "evacuate the building".
Every time we deployed to prod I always joked one day that it would fail so spectacularly that it would cause a code rainbow.3 -
Losing faith in Netflix and their awesome open source projects.
Had a hard time trying to install Security Monkey : poor quality quickstart Ubuntu-only, almost no documentation, same instructions for latest (aka dev) and stable (aka prod) version, no depencies list ... oh and the UI display well only on Chrome ..
Then you surrender and just want to check the dockerized version they provide : it doesn't work neither (build fail or back end process just shut down) !!
I'm done ... -
Had a definite week from hell... a bunch of prod issues that only I could fix (that's a whole other rant for another day!)... a piece of code totally kicking my ass for days... a hosting environment that was unstable seemingly every time I needed to do something in it (and that killer piece of code could ONLY be properly tested there, naturally!)... a service that my app depends on flaking out with no indication what the problem was and another team responsible for it that is based off-shore so aren't responsive when I need them to be... a metric shit-ton of procedural bullshit dropped on my head... an immense amount of stress due to the lead-up to a prod rollout next month that absolutely CANNOT fail without huge ramifications for the business but not enough help to ensure it gets done.
But, with all that said, I DID manage to get that killer piece of code working late on Friday after slamming my head against the wall for over a week on it (and ultimately re-writing it from the ground-up on Thursday and Friday)... so, the week of hell ended on a high note at least, which is always a Very Good Thing(tm)!2 -
This is an actual transcript...
Since it's way too long for the normal 5000 characters, hence splitting it up...
Infra Guy: mr Dev, could you please give some rational for update of jjb?
Dev: sparse checkout support is missing
Infra Guy: is this support mandatory to achive whatever you trying to do?
Dev: yes
Infra Guy: u trying to get set of specific folder for set of specific components?
Dev: yes
Infra Guy: bash script with cp or mv will not work for you?
Dev: no
Infra Guy: ?
Dev: when you have already present functionality why reinvent the wheel
Dev: jenkins has support for it
Dev: the jjb is the bottle neck
Infra Guy: getting this functionality onto our infra would have some implications
Dev: why should I write bash script if jenkins allows me to do that
Dev: what implications ??
Infra Guy: will you commit to solve all the issues caused by new jjb?
Dev: you show me the implications first
Infra Guy: like a year ago i have tried to get new jjb <commit_url>
Infra Guy: no, the implications is a grey area
Infra Guy: i cant show all of them and they may hit like in week or eve month
Dev: then why was it not tackled
Dev: and why was it kept like that
Infra Guy: few jobs got broken on something
Dev: it will crop up some time later
Dev: if jobs get broken because of syntax
Dev: then jobs can be fixed
Dev: is it not ???
Infra Guy: ofc
Infra Guy: its just a question who will fix them
Dev: follow the syntax and follow the guidelines
Dev: put up a test server and try and lets see
Dev: you have a dev server
Dev: why not try on that one and see what all jobs fails
Dev: and why they fail
Dev: rather than saying it will fail and who will fix
Dev: let them fail and then lets find why
Dev: I manually define a job
Dev: I get it done
Infra Guy: i dont think we have test server which have the same workload and same attention as our prod
Dev: unless you test how would you know ??
Dev: and just saying that it broke one with a version hence I wont do it
Infra Guy: and im not sure if thats fair for us to deal with implication of upgrading of the major components just cause bash script is not good enough for u
Dev: its pretty bad
Infra Guy: i do agree
Infra TL Guy: Dev, what Infra Guy is saying is that its not possible to upgrade without downtime
Infra Guy: no
Dev: how long a downtime are we looking at ??
Infra Guy: im saying that after this upgrade we will have deal with consequences for long time
Infra Guy-2: No this is not testing the upgrade is the huge effort as we dont have dev resources to handle each job to run
Dev: if your jjb compiles all the yaml without error
Dev: I am not sure what consequences are we talking of
Infra Guy: so you think there will be no consequences, right?
Dev: unless you take the plunge will you know ??
Dev: you have a dev server running at port 9000
Infra Guy: this servers runs nothing
Dev: that is good
Dev: there you can take the risk
Infra Guy: and the fack we have managed to put something onto api doesnt mean it works
Dev: what API ?
Infra Guy: jenkins api
Infra Guy: hmmm
Dev: what have you put on Jenkins API ??
Infra Guy: (
Dev: jjb is a CLI
Infra Guy: ((
Dev: is what I understand
Dev: not a Jenkins API
Infra Guy: (((
Dev: (((((
Infra Guy: jjb build xmls and push them onto api
Infra Guy: and its doent matter
Dev: so you mean to say upgrading a CLI is goig to upgrade your core jenkisn API
Dev: give me a break
Infra Guy: the matter is that even if have managed to build something and put it onto api
Infra Guy: doesnt mean it will work
Dev: the API consumes the xml file and creates a job
Infra Guy: right
Dev: if it confirms to the options which it understands
Dev: then everything will work
Dev: I am actually not getting your point Infra Guy
Infra Guy: i do agree mr Dev
Dev: we are beating around the bush
Infra Guy: just want to be sure that if this upgrade will break something
Infra Guy: we will have a person who will fix it
Dev: that is what CICD is supposed to let me know with valid reasons
Dev: why can't that upgrade be done
Infra Guy: it can be done
Infra Guy: i even have commit in place3 -
// Rant 1
---
Im literally laughing and crying rn
I tried to deploy a backend on aws Fargate for the first time. Never used Fargate until now
After several days of brainwreck of trial and error
After Fucking around to find out
After Multiple failures to deploy the backend app on AWS Fargate
After Multiple times of deleting the whole infrastructure and redoing everything again
After trying to create the infrastructure through terraform, where 60% of it has worked but the remaining parts have failed
After then scraping off terraform and doing everything manually via AWS ui dashboard because im that much desperate now and just want to see my fucking backend work on aws and i dont care how it will be done anymore
I have finally deployed the backend, successfully
I am yet unsure of what the fuck is going on. I followed an article. Basically i deployed the backend using:
- RDS
- ECS
- ECR
- VPC
- ALB
You may wonder am i fucking retarded to fail this hard for just deploying a backend to aws?
No. Its much deeper than you think. I deployed it on a real world production ready app way.
- VPC with 2 public and 2 private subnets. Private subnets used only for RDS. Public for ALB.
- Everything is very well done and secure. 3 security groups: 1 for ALB (port 80), 1 for Fargate (port 8080, the one the backend is running on), 1 for RDS postgres (port 5432). Each one stacked on top and chained
- custom domain name + SSL certificate so i can have a clean version of the fully working backend such as https://api.shitstain.com
- custom ECS cluster
- custom target groups
- task definitions
Etc.
Right now im unsure how all of this is glued together. I have no idea why this works and why my backend is secure and reachable. Well i do know to some extent but not everything.
To know everything, I'll now ask some dumbass questions:
1. What is ECS used for?
2. What is a task definition and why do i need it?
3. What does Fargate do exactly? As far as i understood its a on-demand use of a backend. Almost like serverless backend? Like i get billed only when the backend is used by someone?
4. What is a target group and why do i need it?
5. Ive read somewhere theres a difference between using Fargate and... ECS (or is it something else)? Whats the difference?
Everything else i understand well enough.
In the meantime I'll now start analyzing researching and understanding deeply what happened here and why this works. I'll also turn all of this in terraform. I'll also build a custom gitlab CI/CD to automate all of this shit and deploy to fargate prod app
// Rant 2
---
Im pissing and shitting a lot today. I piss so much and i only drink coffee. But the bigger problem is i can barely manage to hold my piss. It feels like i need to piss asap or im gonna piss myself. I used to be able to easily hold it for hours now i can barely do it for seconds. While i was sleeping with my gf @retoor i woke up by pissing on myself on her bed right next to her! the heavy warmness of my piss woke me up. It was so embarrassing. But she was hardcore sleeping and didnt notice. I immediately got out of bed to take a shower like a walking dead. I thought i was dreaming. I was half conscious and could barely see only to find out it wasnt a dream and i really did piss on myself in her bed! What the fuck! Whats next, to uncontrollably shit on her bed while sleeping?! Hopefully i didnt get some infection. I feel healthy. But maybe all of this is one giant dream im having and all of u are not real9