Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "deploy prod"
-
I worked with a good dev at one of my previous jobs, but one of his faults was that he was a bit scattered and would sometimes forget things.
The story goes that one day we had this massive bug on our web app and we had a large portion of our dev team trying to figure it out. We thought we narrowed down the issue to a very specific part of the code, but something weird happened. No matter how often we looked at the piece of code where we all knew the problem had to be, no one could see any problem with it. And there want anything close to explaining how we could be seeing the issue we were in production.
We spent hours going through this. It was driving everyone crazy. All of a sudden, my co-worker (one referenced above) gasps “oh shit.” And we’re all like, what’s up? He proceeds to tell us that he thinks he might have been testing a line of code on one of our prod servers and left it in there by accident and never committed it into the actual codebase. Just to explain this - we had a great deploy process at this company but every so often a dev would need to test something quickly on a prod machine so we’d allow it as long as they did it and removed it quickly. It was meant for being for a select few tasks that required a prod server and was just going to be a single line to test something. Bad practice, but was fine because everyone had been extremely careful with it.
Until this guy came along. After he said he thought he might have left a line change in the code on a prod server, we had to manually go in to 12 web servers and check. Eventually, we found the one that had the change and finally, the issue at hand made sense. We never thought for a second that the committed code in the git repo that we were looking at would be inaccurate.
Needless to say, he was never allowed to touch code on a prod server ever again.8 -
One of our clients deploy their own server app. So this happened after a prod deployment. (4am)
*Cellphone rings while sleeping*
Client : we need you on the conference call now. URGENT!
*Gets on conference call*
*Client explain the problem*
*Explaining to the client that the problem is in their side (https connection not working, either network or certificate problem)*
*Client doesn't believe it and pushes me for a fix that I have no control on*
*4 hours later in a heated conversation*
Client : ok problem is on our side. We used our SSL certificate from staging with production and thought it would work.
Me :5 -
My team handles infrastructure deployment and automation in the cloud for our company, so we don't exactly develop applications ourselves, but we're responsible for building deployment pipelines, provisioning cloud resources, automating their deployments, etc.
I've ranted about this before, but it fits the weekly rant so I'll do it again.
Someone deployed an autoscaling application into our production AWS account, but they set the maximum instance count to 300. The account limit was less than that. So, of course, their application gets stuck and starts scaling out infinitely. Two hundred new servers spun up in an hour before hitting the limit and then throwing errors all over the place. They send me a ticket and I login to AWS to investigate. Not only have they broken their own application, but they've also made it impossible to deploy anything else into prod. Every other autoscaling group is now unable to scale out at all. We had to submit an emergency limit increase request to AWS, spent thousands of dollars on those stupidly-large instances, and yelled at the dev team responsible. Two weeks later, THEY INCREASED THE MAX COUNT TO 500 AND IT HAPPENED AGAIN!
And the whole thing happened because a database filled up the hard drive, so it would spin up a new server, whose hard drive would be full already and thus spin up a new server, and so on into infinity.
Thats probably the only WTF moment that resulted in me actually saying "WTF?!" out loud to the person responsible, but I've had others. One dev team had their code logging to a location they couldn't access, so we got daily requests for two weeks to download and email log files to them. Another dev team refused to believe their server was crashing due to their bad code even after we showed them the logs that demonstrated their application had a massive memory leak. Another team arbitrarily decided that they were going to deploy their code at 4 AM on a Saturday and they wanted a member of my team to be available in case something went wrong. We aren't 24/7 support. We aren't even weekend support. Or any support, technically. Another team told us we had one day to do three weeks' worth of work to deploy their application because they had set a hard deadline and then didn't tell us about it until the day before. We gave them a flat "No" for that request.
I could probably keep going, but you get the gist of it.4 -
PM (on slack): "we’re about to deploy to production".
Me: "ok"
… I keep on working on a task / remain available for any post deployment issues …
PM (5 minutes later on slack): "deployment broke production! We need to handle this NOW!"
My dev colleague has already called it a day, but I’m still online
Me: "ok I don’t have access to prod, can you describe what’s going on? I can’t reproduce on any other environment"
PM: …
10 minutes go by
Me: "anybody there?"
PM: …
45 minutes later, I realize PM is offline
The following day:
PM: "ok we got prod running again" (turns out it was client’s fault for not updating a config we as devs can’t access)
PM: "but we’re REALLY UPSET! You guys need to be available to intervene for any issues following deployment to production! At least one of you should be available!"
Me: "but, but…" 🫠14 -
The riskiest dev choice...
How about "The riskiest thing you've done as a dev"? I have a great entry for that. and I suppose it was my choice to build the feature afterall.
I was working on an instance of a small MMO at a game company I worked for. The MMO boasted multiple servers, each of them a vastly different take on the base game. We could use, extend, or outright replace anything we wanted to, leading to everything from Zelda to pokemon to an RP haven to a top-down futuristic counterstrike. The server in this particular instance was a fantasy RPG, and I was building it a new leveling and experience system with most of the trimmings. (Talents, feats/perks, etc. were in a future update.)
A bit of background, first: the game's dev setup did not have the now-standard dev/staging/prod servers; everything ran on prod, devs worked on prod, players connected and played on prod, etc. Worse yet, there was no backup system implemented -- or not really. The CTO was really the only person with sufficient access. The techy CEO did as well, but he rarely dealt with anything technical except server hardware, occasionally. And usually just to troll/punish us devs (as in "Oops ! I pulled the cat5 ! ;)"). Neither of them were the most reliable of people, either. The CTO would occasionally remote in and make backups of each server -- we assumed whenever he happened to think of it -- and would also occasionally do it when asked, but it could take him a week, sometimes even up to a month to get around to it. So the backups were only really useful for retreiving lost code and assets, not so much for player data.
The lack of reliable backups and the lack of proper testing grounds (among the plethora of other issues at the company) made for an absolutely terrible dev setup, but that's just how it was, and that's what we dealt with. We were game devs, afterall. Terrible or not, we got to make games! What more could you ask for!? It was amazing and terrible and wonderful and the worst thing ever, all at the same time. (and no, I'm not sharing the company name, but it isn't EA or Nexon, surprisingly 😅)
Anyway, back to the story! My new leveling system also needed to migrate players' existing data, so... you can see where this is going.
I did as much testing and inspection of my code as I could, copied it from a personal dev script to the server's xp system, ... and debated if I really wanted to click [Apply]. Every time I considered it, I went back to check another part or do yet more testing. I ended up taking like 40 minutes to finally click it.
And when I did... that was the scariest button press of my life. And the scariest three seconds' wait afterwards. That one click could have ruined every single player's account, permanently lost us players ...
After applying it, I immediately checked my character to see if she was broken, checked the account data for corruption or botched flags, checked for broken interactions with the other systems....
Everything ended up working out perfectly, and the players loved all of the new features. They had no idea what went into building them, and certainly had no idea of what went into applying them, or what could have gone wrong -- which is probably a good thing.
Looking back, that entire environment was so fragile, it's a wonder things didn't go horribly wrong all the time. Really, they almost never did. Apocalypses did happen, but were exceedingly rare, and were ususally fixed quickly. I guess we were all super careful simply because everything was so fragile? or the decent devs were, at least. We never trusted the lessers with access 😅 at least on the main servers where it mattered. Some of the smaller servers... well, we never really cared about those.
But I'm honestly more surprised to realize I've never had nightmares of that button click. It was certainly terrifying enough.
But yay! Complete system overhaul and migration of stored and realtime player data! on prod! With no issues! And lots of happy players! Woooooo!
Thinking back on it makes me happy 😊rant deploying straight to prod prod prod prod dev server? dev on prod you chicken migration on prod wk149 git? who's a git? you're a git! scariest deploy ever game development1 -
Story #1: So I took a month of parental leave. And was planning to extend it a little longer to deal with my final exams. I was planning to spend lots of quality time with my wife and newborn son. Little did I know... It turns out that out of 5 OoO weeks I was looking forward I actually had 3 at most. The rest I've spent working remotely as I was insisted to deploy a brand new and poorly tested feature to PROD 2 days before my paternity leave. So I spent 2 weeks debugging things in PROD. Remotely. Needless to say that did suck.
Story #2: After story #1 I've learnt my lesson. This summer I took 3 weeks annual leave to renovate my apartment. I asked to not to be disturbed unless there's an emergency. And an emergency it was. One of our app users had a planned hi-load batch job lasting for 2-3 months. Hundreds of thousands of items had to be created and processed. It turns out the _processing_ algo had some flaws and was acting out. I was called out and asked to assist. I knew this sort of debugging is going to take a lot of my time so this time I put my conditions on the table: I will assist but I'll extend my leave by 1.5 the time I spend working now. They took the deal. Instead of 3 weeks I had 5 weeks of vacation!
I don't care that much about my salary. I prefer to exchange it for my time off hence I didn't ask for compensations.
Bottom line: NEVER EVER underestimate or undersell your time and effort. You are a valuable asset and if the team/client needs you on your day off -- make it count. Your time off is YOUR time. Never forget it.3 -
any hardcore peeps like this? :P
gotta commit b4 I do anything..
gotta push b4 I do anything..
gotta deploy to prod b4 I do anything..
God! kill me already 😭😭😭😭5 -
Current work project is microservices architecture out of 4 - 8 components.
It is fully Infrastructure as a Code automatized. I just change somewhere code, git pushing
And it automatically invokes Gitlab CI, terraform, ansible, kubernetes helm charts.
Auto checking itself with unit and integration tests in autoredeployed staging env. Then it saves tested results to docker registry and asks for one button verificating click to be rereleased to prod.
I just go for drink or eat food. While all the stuff is happening.
And I am proud that all the infrastructure, backend and frontend I made on my own.
I don't need to remember how to Deploy it. It is all automatized3 -
Highlights from my week:
Prod access: Needed it for my last four tickets; just got it approved this week. No longer need it (urgently, anyway). During setup, sysops didn’t sync accounts, and didn’t know how. Left me to figure out the urls on my own. MFA not working.
Work phone: Discovered its MFA is tied to another coworker’s prod credentials. Security just made it work for both instead of fixing it.
My merchant communication ticket: I discovered sysops typo’d my cronjob so my feature hasn’t run since its release, and therefore never alerted merchants. They didn’t want to fix it outside of a standard release. Some yelling convinced them to do it anyway.
AWS ticket: wow I seriously don’t give a crap. Most boring ticket I have ever worked on. Also, the AWS guy said the project might not even be possible, so. Weee, great use of my time.
“Tiny, easy-peasy ticket”: Sounds easy (change a link based on record type). Impossible to test locally, or even view; requires environments I can’t access or deploy to. Specs don’t cover the record type, nor support creating them. Found and patched it anyway.
Completed work: Four of my tickets (two high-priority) have been sitting in code review for over a month now.
Prod release: Release team #2 didn’t release and didn’t bother telling anyone; Release team #1 tried releasing tickets that relied upon it. Good times were had.
QA: Begs for service status page; VP of engineering scoffs at it and says its practically impossible to build. I volunteered. QA cheered; VP ignored me.
Retro: Oops! Scrum master didn’t show up.
Coworker demo: dogshit code that works 1 out of 15 times; didn’t consider UX or user preferences. Today is code-freeze too, so it’s getting released like this. (Feature is using an AI service to rearrange menu options by usage and time of day…)
Micromanager response: “The UX doesn’t matter; our consumers want AI-driven models, and we can say we have delivered on that. It works, and that’s what matters. Good job on delivering!”
Yep.
So, how’s your week going?2 -
New country, new company, new team, new projects.
I'm supposed to be the TL of a team working on a React project.
A guy in his late 40s celebrates himself as "the senior", he basically just finished watching a youtube thing, React 101 crash course or similar. The other two juniors who did only Wordpress so far venerate him like a god.
The code, of course, is one on the finest pieces of crap I ever had the pleasure to deal with in my life: naturally a bunch of JQuery plugins for everything, no tests, no state management, side effects everywhere, shared state and globals like hell, everything written in ES3/ES5 style, no types, no docs, build and deploy totally manual, deep props drilling at every level... and not to mention the console.log() shipped in prod.
First day, already headache.
Full rewrite start tomorrow.
Hiring real devs as well.4 -
Deadline : Friday evening
Testing : There is only prod
Deploy : Friday evening
Why : Boss says customer wants it
Me : . . .11 -
There is a company providing a very speciffic service. And it has a core application for that svc, supported by a core app team.
That company also has other services, which are derivations of the core one. So every svc depends on core.
Now that we're clear on that... I was working in a team of one of the subservices. We very strongly depended on core. In fact, our svc was useless if integration w/ core broke down.
The core team had an annoying habbit. They refused to version their webservices and they LOVED to push api updates w/o any warnings. Our prod, test, other envs used to fail bcz of core api changes quite often. Mgmt, IT head was aware of the problem and customers' complaints as well.
So as a result, once core api changes we're all in a panic mode: all prior priorities are lowered and revival of prod is to be our main focus. Core api is not docummented, the changes are not clear, so we have to reverse engineer the shit out of it. We manage to patch our prod up w/ hotfixes, but now we have tech debt. While working on the debt, core api changed again, in test env. Mgmt pushes debt back and reallocates us to hotfix test. Hotfix is 80% done when another core api breaks. Now mgmt asks us to drop wtv we're working on and fix that new break. By the time we're to deploy the hotfix, another api breaks in another env. The mgmt..... You get the picture :)
2 years go by, nothing has changed so far.6 -
4:55: Everything looks good in prod.
4:56: Deploy new feature after all is well in Dev.
4:57: Prod goes to shit!
4:58: Call wife to tell her I'm not coming home at 5.
4:59: Prod looks fine w/o anyone doing anything.
5:00: Leave work.
5:25: Get yelled at by wife for leaving at 5 after telling her I couldn't.3 -
I'm having an existential crisis with this client.
We are spending millions of $s every year to make sure the product's performance is perfect. We are testing various scenarios, fine-tuning PLABs: the environment, application, middleware, infra,... And then we provide our recommendations to the client: "To handle load of XX parallel users focusing on YY, yy and Zy APIs, use <THIS> configuration".
And what the client does?
- take our recommendations and measure the wind speed outside
- if speed is <20m/s and milk hasn't gone bad yet, add 2x more instances of API X
- otherwise add 3xX, 1xY and give more CPUs to Z
- split the setup in half and deploy in 2 completely separate load-balanced prod environments.
- <do other "tweaking">
- bomb our team with questions "why do we have slow RTs?", "why did the env crash?", "why do we have all those errors?", "why has this been overlooked in PLABs?!?"
If you're improvising despite our recommendations, wtf are we doing here???
One day I will crack. Hopefully, not sometime soon.3 -
This was some time ago. A Legendary bug appeared. It worked in the dev environment, but not in the test and production environment.
It had been a week since I was working on the issue. I couldn't pinpoint the problem. We CANNOT change the code that was already there, so we needed to override the code that was written. As I was going at it, something happened.
---
Manager: "Hey, it's working now. What did you do?"
Me: *Very confused because I know I was nowhere close to finding the real source of the problem* Oh, it is? Let me check.
Also me: *Goes and check on the test and prod environment and indeed, it's already working*
Also me to the power of three: *Contemplates on life, the meaning of it, of why I am here, who's going to throw out the trash later, asking myself whether my buddies and I will be drinking tonight, only to realize that I am still on the phone with my manager*
Me again: "Oh wow, it's working."
Manager: "Great job. What were the changes in the code?"
Me: "All I did was put console logs and pushed the changes to test and prod if they were producing the same log results."
Manager: "So there were no changes whatsoever, is that what you mean?"
Me: "Yep. I've no idea why it just suddenly worked."
Manager: "Well, as long as it's working! Just remove those logs and deploy them again to the test and prod environment and add 'Test and prod fix' to the commit comment."
Me: "But what if the problem comes up again? I mean technically we haven't resolved the issue. The only change I made were like 20 lines of console logs! "
Manager: "It's working, isn't it? If it becomes a problem, we'll work it out later."
---
I did as I was told, and Lo and Behold, the problem never occurred again.
Was the system playing a joke on me? The system probably felt sorry for me and thought, "Look at this poor fucker, having such a hard time on a problem he can't even comprehend. That idiotic programmer had so many sleepless nights and yet still couldn't find the solution. Guess I gotta do my job and fix it for him. I'm the only one doing the work around here. Pathetic Homo sapiens!"
Don't get me wrong, I'm glad that it's over but..
What the fuck happened?5 -
I'm preparing a major deployment & what do I see?
SOMEONE MANUALLY EDITED PROD AGAIN
IMO they deserve to be hit on their hands with a stick every time they even think of fucking with it.13 -
TL;DR Dear boss, firstly, you always get someone to review anything important done by a fucking intern.
Secondly, you do not give access to your fucking client's production server to an intern.
Thirdly, you don't ask your fucking intern to test the intern's work that has not been reviewed by anyone directly on your client's fucking production server.
Last week, the boss and one of the lead devs (the only guy with some serious knowledge about systems and networking) decided to give me (an intern who barely has any work experience) the task of fixing or finding an alternate solution to allowing their support team access to their client machines. Currently they used a reverse SSH tunnel and an intermediary VH but for some reason, that was very unreliable in terms of availability. I suggested using OpenVPN and explained how it would work. Seemed to be a far better idea and they accepted. After several days of working through documentations and guides and everything, I figured out how OpenVPN works and managed to deploy a TEST server and successfully test remote access using two VMs. On seeing my tests, the boss told me that he wanted to test it on the client network. I agreed. Today he comes to me and he tells me to prepare testing for tomorrow and that the client technician is going to give me access to one of their boxes. And then he adds, "It's a working prod server. We'll see if we can make it work on that" and left. I gaped at him for a while and asked another dev guy in the room if what I heard was right. He confirmed. Turns out, the lead dev and the boss's son (who also works here) had had a huge argument since morning on the same issue and finally the dev guy had washed it off his hands and declared that if anything goes wrong from testing it on production, it's entirely the boss's own fault. That's when the boss stepped in and approached me. I ran back to his office and began to explain why prod servers don't top the list of things you can fuck around with. But he simply silenced me saying, "What can go wrong?" and added, "You shouldn't stay still. You should keep moving". Okay, like firstly what the fuck and secondly, what the fuck?.
Even though OpenVPN client is not the scariest thing to install, tomorrow's going to be fun.4 -
So last week I really fucked up
I had this new implementation that was supposedly to be integrating smoothly into the rest of the service. It depended on a serialized model made by a data scientist. I test it in local, in QA environment: no problem.
So, Friday, 4pm, I decide to deploy to production. I check once from the app: the service throw an error. Panic attack, my chief is at my desk, we triy to understand what went wrong. I make calls with cUrls: no problem. Everything seems fine. I recheck from the app again: no problem.
We dedice to let it in prod, as the feature work. I go get some beers with the guys, to celebrate the deploy.
Fast-forward the next morning, 11am, my phone ring: it's a colleague of my chief. "Please check Slack, a client is trying to use the feature, it's broken"
FUUUUUUUUUUUUCK!!!
Panic attack again. I go to the computer, check the errors: two types of errors. One I can fix, the other from a missing package on the machine that the data guy used.
Needless to say, I had a fairly good weekend.
Lessons learned:
- make sure Dev, QA and Prod are exactly the same (use Ansible or Container)
- never deploy on a Friday afternoon if you don't have a quick way to revert1 -
Just needing somewhere to let some steam off
Tl;dr: perfectly fine commandline system is replaced by bad ui system because it has a ui.
For a while now we have had a development k8s cluster for the dev team. Using helm as composing framework everything worked perfectly via the console. Being able to quickly test new code to existing apps, and even deploy new (and even third party apps) on a simar-to-production system was a breeze.
Introducing Rancher
We are now required to commit every helm configuration change to a git repository and merge to master (master is used on dev and prod) before even being able to test the the configuration change, as the package is not created until after the merge is completed.
Rolling out new tags now also requires a VCS change as you have to point to the docker image version within a file.
As we now have this awesome new system, the ops didn't see a reason to give us access to kubectl. So the dev team is stuck with a ui, but this should give the dev team more flexibility and independence, and more people from the team can roll releases.
Back to reality: since the new system we have hogged more time from ops than we have done in a while, everyone needs to learn a new unintuitive tool, and the funny thing, only a few people can actually accept VCS changes as it impacts dev and prod. So the entire reason this was done, so it is reachable to more people, is out the window.3 -
Rolled out a new application I built almost entirely by myself 2 days ago... But my dev group is understaffed and has a project manager who is literally the most clueless person I have ever met, so as a result, we don't have a functional/useful dev/test/prod framework and no standards for how to deploy apps. So my past 2 days were comprised of fixing bugs in the live system that could probably have been caught if I had the time and resources to get everything thoroughly tested. It's stable now, but damn our management for being generally idiots. Our motto appears to be "Fuck it, we'll do it live"1
-
I hate dev politics...
PM: Hey there is a weird error happening when I upload this file on production, but it works on our test environments.
Me: After looking at this error, I don't find any issues with the code, but this variable is set when the application is first loaded, I bet it wasn't loaded correctly our last deployment and we just need to reload the application.
Senior Dev: We need to output all of the errors and figure out where this error is coming from. Dump out all the errors on everything in production!!
Me: That's dumb... the code works on test... it's not the code.. it's the application.
Senior dev: %$*^$>&÷^> $
Me: Hey I have an idea! If test works... I can go ahead and deploy last week's changes to prod and dump those errors you were talking about!!
Senior Dev: OK
Me: *runs Jenkins job the deploys the new code and restarts the application*
PM: YAY you fixed it!!
Senior Dev: Did you sump put those errors like I said.
Me: Nope didn't touch a thing... I just deployed my irrelevant changes to that error and reloaded the application.2 -
Tl;Dr Im the one of the few in my area that sees sftping as the prod service account shouldn't be a deployment process. And the ONLY ONE THAT CARES THAT THIS IS GONNA BREAK A BUNCH OF SHIT AT SOME POINT.
The non tl;dr:
For a whole year I've been trying to convince my area that sshing as the production service account is not the proper way to deploy and/or develop batch code. My area (my team and 3 sister teams) have no concept of using version control for our various Unix components (shell scripts and configuration files) that our CRITICAL for our teams ongoing success. Most develop in a "prodqa like" system and the remainder straight in production. Those that develop straight in prodqa have no "test" deployment so when they ssh files straight to actual production. Our area has no concept of continuous integration and automated build checking. There is no "test cases", no "systems testing" or "regression testing". No gate checks for changing production are enforced. There is a standing "approved" deployment process by the enterprise (my company is Whyyyyyyyyyy bigger than my area ) but no one uses it. In fact idk anyone in my area who knows HOW to deploy using the official deployment method. Yes, there is privileged access management on the service account. Yes the managers gets notified everytime someone accesses the privileged production account. The managers don't see fixing this as a priority. In fact I think I've only talk to ONE other person in my area who truly understands how terrible it is that we have full production change access on a daily basis. Ive brought this up so many times and so many times nothing has been done and I've tried to get it changed yet nothing has happened and I'm just SO FUCKING SICK that no one sees how big of a deal this. I mean, overall I live the area I work in, I love the people, yet this one glaring deficiency causes me so much fucking stress cause it's so fucking simple to fix.
We even have an newer enterprise deployment. Method leveraging a product called "urban code deploy" (ucd) to deploy a git repository. JUST FUCKING GIT WITH THE PROGRAM!!!!..... IT WAS RELEASED FUCKING 12 YEARS AGO......
Please..... Please..... I just want my otherwise normally awesome team to understand the importance and benefits of version control and approved/revertable deployments2 -
I'm still studying computer science/programming, I still have one year to do in order to graduate (Master). I am in a work study program so I'm working for a company half of the time, and I'm studying the other half. It is important to mention that I am the only web developer of the company
When I arrived in the company 9 months ago, I was given a Vue project which had been developed by a trainee a few weeks before my arrival and I was asked to correct a few things, it was mostly about css. Then, I was ask to add a few functionalities, nothing really hard to code, and we were supposed to test the solution in a staging environment, and if everything was ok, deploy it to prod.
However, the more I did what I was asked, the more functionalities I had to implement, until I reached a point where I had to modify the API, create new routes, etc. I'm not complaining about that, that's my job and I like it. But the solution was supposed to be ready when I arrived, it was also supposed to be tested and deployed.
The problem is, the person emitting these demands (let's call him guy X) is not from the IT service, it's a future user of the website in the admin side. The demands kept going and going and going because, according to him, the solution was not in a good enough state to be deployed, it missed too many (un)necessary features. It kept going for a few months.
The best is yet to come though : guy X was obviously a superior, and HIS superior started putting pressure on me through mails, saying the app was already supposed to be in production and he was implying that I wasn't working fast enough. Luckily, my IT supervisor was aware of what was going on and knew I obviously wasn't to blame.
In the end, the solution was eagerly deployed in production, didn't go through the staging environment and was opened to the users. Now, guy X receives complaints because none of what I did was tested (it was by me, but I wasn't going to test every single little thing because I didn't have time). Some users couldn't connect or use this or that feature and I am literally drowning in mails, all from guy X, asking me to correct things because users are blocked and it's time consuming for him to do some of the things the website was doing manually.
We are here now just because things have been done in a rush, I'm still working on it and trying to fix prod problems and it's pissing me off because we HAVE a staging environment that was supposed to prevent me from working against the clock.
On a final note, what's funny is that the code I'm modifying, the pre-existing one needs to be refactored because bits and pieces are repeated sometimes 5 times where it should have been externalized and imported from another file. But I don't know when and if I will ever be able to do that.
I could have given more context but it's 4am and I'm kinda tired, sorry if I'm not clear or anything. That's my first rant -
A loooong time ago...
I've started my first serious job as a developer. I was young yet enthusiastic as well as a kind of a greenhorn. First time working in a business, working with a team full of experienced full-lowered ultra-seniors which were waiting to teach me the everything about software engineering.
Kind of.
Beside one senior which was the team lead as well there were two other devs. One of them was very experienced and a pretty nice guy, I could ask him anytime and he would sit down with me a give me advice. I've learned a lot of him.
Fast forward three months (yes, three months).
I was not that full kind of greenhorn anymore and people started to give me serious tasks. I had some experience in doing deployments and stuff from my other job as a sysadmin before so I was soon known as the "deployment guy", setting up deployments for our projects the right way and monitoring as well as executing them. But as it should be in every good team we had to share our knowledge so one can be on vacation or something and another colleague was able to do the task as well.
So now we come to the other teammate. The one I was not talking about till now. And that for a reason.
He was very nice too and had a couple of years as a dev on his CV, but...yeah...like...
When I switched some production systems to Linux he had to learn something about Linux. Everytime he encountered an error message he turned around and asked me how to fix it. Even. For. The. Simplest. Error. He. Could. Google. Up.
I mean okay, when one's new to a system it's not that easy, but when you have an error message which prints out THE SOLUTION FOR THE ERROR and he asks me how to fix it...excuse me?
This happened over 30 times.
A. Week.
Later on I had to introduce him to the deployment workflow for a project, so he could eventually deploy the staging environment and the production environment by hisself.
I introduced him. Not for 10 minutes. I explained him the whole workflow and the very main techniques and tools used for like two hours. Every then and when I stopped and asked him if he had any questions. He had'nt! Wonderful!
Haha. Oh no.
So he had to do his first production deployment. I sat by his side to monitor everything. He did well. One or two questions but he did well.
The same when he did his second prod deploy. Everythings fine.
And then. It. Frikkin. Begins.
I was working on the project, did some changes to the code. Okay, deploy it to dev, time for testing.
Hm.
Error checking out git. Okay, awkward. Got to investigate...
On the dev server were some files changed. Strange. The repo was all up to date. But these changes seemed newer because they were fixing at least one bug I was working on.
This doubles the strangeness.
I want over to my colleague's desk.
I asked him about any recent changes to the codebase.
"Yeah, there was a bug you were working on right? But the ticket was open like two days so I thought I'll fix it"
What the Heck dude, this bug was not critical at all and I had other tasks which were more important. Okay, but what about the changed files?
"Oh yeah, I could not remember the exact deployment steps (hint from the author: I wrote them down into our internal Wiki, he wrote them done by hisself when introducing him and after all it's two frikkin commands), so I uploaded them via FTP"
"Uhm... that's not how we do it buddy. We have to follow the procedure to avoid..."
"The boss said it was fine so I uploaded the changes directly to the production servers. It's so much easier via FTP and not this deployment crap, sorry to say that"
You. Did. What?
I could not resist and asked the boss about this. But this had not Effect at all, was the long-time best-buddy-schmuddy-friend of the boss colleague's father.
So in the end I sat there reverting, committing and deploying.
Yep
It's soooo much harder this deployment crap.
Years later, a long time after I quit the job and moved to another company, I get to know that the colleague now is responsible for technical project management.
Hm.
Project Management.
Karma's a bitch, right? -
https://devops.com/dear-staging-wer...
What the F#*@&$# %#@$!?!?!
This person has decided to skip using staging, because it doesn't correctly reflect prod!
If that's your problem, than why don't you try to fix it? Create a DB with fake data, make one based on anonymised customer data, or even do it on non-anonymised data (with permission of course), but fix the staging env so that it reflects prod!
This is a devops site (it's literally the name!), and instead of teaching you how to make staging exactly like prod, they tell you to do what caused the creation of the staging->prod system IN THE FIRST PLACE!!!
There's all these stuff like Vagrant that are literally designed to help you as a dev mimic prod, and you just throw it all out!?
"With feature flags, I can safely test in production without fear of breaking something or negatively affecting the customer experience."
Famous last words.12 -
So I made an update to my React Native app. I changed UI of a couple of screen, added a few animations here and there, refactored how my graphQL resolvers work in the backend(no breaking changes), changed how data gets loaded into the database etc.
It worked in dev so I figured hey let's deploy it. Today is(was because it's now 3am but more on that later) a national holiday so no one goes to work so no one will use my app so I have an entire day to deploy.
I started at 15:00(because i woke up at 13:00 lol). I tested the update once again in dev and proceeded to deploy it to prod. I merged backend to master, built docker images, did migrations on the db, restarted docker-compose with new images. And now for the app. I run ./gradlew assembleRelease and it starts complaining that react-native-gesture-handler is not installed. Ugh, rm -rf node_modules && yarn install. It worked. But now gradlew crashes and logs don't tell me anything. Google tells me to change a bunch of gradle settings but none of them work. Fast forward 5h, it's around 20:00 and I isolated the issue to, again, react-native-gesture-handler. They updated from 2.2.4 to 2.3.0 which didn't fucking compile. 2 more hours passed (now 22:00) and I got v2.3.1 working which fixed the problem in 2.3.0 but made my app crash on startup. YOUR FUCKING LIBRARY GETS 250K WEEKLY DOWNLOADS AND YOU DONT EVEN BOTHER CHECKING IF IT COMPILES IN PROD ON ANDROID?! WHAT THE FUCK software-mansion?
After I solved that, my app didn't crash. Now it threw an error "Type errors: Network Request Failed" every time I fetch my legacy REST API(older parts use rest and newer use graphql. I'll refactor that in the next update). I'll spare you the debugging hell i went through but another 5h passed. Its 3am. My config had misspelled url to prod but good for dev... I hate myself and even more so react-native-gesture-handler.3 -
Until today, I had assumed deploying stuff to prod would NOT be one of my responsabilities in this company. Apparently that's not the case.
Had to deploy my code and pray it didn't break anything. Why is this a big deal at all?
Well you see, there is no repository. At all. No git, no svn, not even duplicate folders. No tests, no pipeline. Just a bunch of CPanels.
Had to manually copy files and folders from the development site to the production site and partially copy a database. "Just drag and drop" were the instructions I was given.
As if using CakePHP2, PHP5 and having to parse fucking Excel files wasn't bad enough, now I have to deal with one of the worst ways to deploy code.
Fuck it, I'm switching on the looking-for-job flag on linkedin.5 -
Lessions I learned so far from my first big node/npm project with tons of users:
1) If you didn't build something for a while, expect 3 hours of resolving version conflicts for every two weeks since the last build.
2) Even if the tests pass, run the containers on your own machine and make sure that the app doesn't randomly crash before deploying
3) Even if the app seemed to work on your own machine, run the tests again in an environment mimicking prod at most 15 minutes before replacing the running containers.
4) Even if all else indicates that the app will work, only ever deploy if you expect to be available within the 4 hours following a deployment.
5) Don't use shrinkwrap for anything other than locking every version down completely. A partial shrinkwrap will produce bugs that are dependent on the exact hour you built the app _and_ the shrinkwrap file, and therefore no one will ever have seen them other than you.
6) Avoid gyp, and generally try not to interface too much with anything that doesn't run on node. If parts of your solution use very different toolchains, your problems will be approximately proportional to the amount of code. And you'd be surprised just how much code you're running. (otherwise it's more logarithmic because the more code the less likely a new assumption is unique)
7) Do not update webpack or its plugins or anything they might call unless you absolutely need to
8) Containers are cool but the alpine ones are pretty much useless if you have even just one gyp module.
9) There's always another cache. To save yourself a lot of pain, include the build time in every file or its name that the browser can download, and compare these to a fresh build while debugging to assert that the bug is still present in the code you're reading
+1) Although it may look like it, SQLite is far from a simple solution because the code and the bindings aren't maintained. In fact, it'll probably be more time consuming than using a proper database.3 -
Due to covid, mgrs decided to fire 10% but could not negotiate schedule increase with internal IT. With no promotions or hikes, few full stacks we have leave.
Now am working with 2 data engg doing cloud java microsvs work while learning. Their first delivery was applauded by their mgr who is under pressure to retain them.
I as arch review their code. No unit tests, print statements all around, shoddy exception handling, variable naming issues. We have Sonar by default in our build. They ignore the report. I ask them about it. Seems mgr told them he is getting a contract person from another team on part time basis to do/fix. I share my confusion.
Mgr calls me up and checks if we can put it as tech debt backlog and deploy to prod !!!1 -
Me: So we're deploying this today on prod.
Junior: Can you record the deployment steps for us so that we can deploy whenever you're unavailable?
Me: Seriously?
I liked their enthusiasm though.
Maybe it's just too early as these chaps don't even know the basic commands right now.
What y'all have on this?12 -
This project is just a complete clusterfuck... But nvm. We had to integrate a third party service pushing data into our system. Btw the service wasnt even working correctly. But that is just the tip of the iceberg. Its friday around lunch time. Message appears "what is the status of the integration?" Yeah havent started working on it. Last info was service is not stable. I doubt that this will be done this week. Next message from PO: "We will all push hard to get this done today and deploy to prod." Why? Because this dumbasses said to the customer this will be deployed eod. And by we you mean the devs once again doing overtime. Has this shit stopped? No. Like for the last two weeks its like we promised the customer xyz to be deployed tomorrow. Not a single dev was asked how long it takes to add this3
-
did on my last project:
1 .Using QA env as dev env
2. Deploy in production not completely tested stuff (90% tested)
3. Run with errors in prod
4. Manual fix in prod
5. Git versioning1 -
There is a serious possibility that our team will need to deploy into prod on a Friday because of a regulatory deadline and 3rd parties not being ready.
God help us -
Every time we have a release, "Release Engineering" stops building our test environments from master. I've been preaching Continuous Deployment for months and here we are with the most broken system. The build actually builds all test environments AND prod from the same branch because "it's easier" for "Release Engineering". So now I have to wait for the code freeze on release and hope RE doesn't fuck up and deploy an untested branch to prod... AND hope we're given enough time to test and debug the next release since we can't right now...1
-
Today, I'm making a revolutionary change to our code base. I'm finally deprecating a script that lived in this goddamn repo for way too long. There's about 10 copies of the same script in 10 different directories. The script copies all the code into a "dev" folder and then runs "sed s/prod/dev/g" on all the files, and then overwites a bunch of it with some files suffixed with ".dev". Finally, after fighting the so-called architects, the devs and everyone else that seems to have gotten used to the pain this cursed dumpster fire script has caused us, my merge request is now open and ready to go to get rid of this insanity. Now we won't have to deal with as many "surprises" that happen every goddamn time we deploy to production, overwriting all our hard work by accident, and relieve some of my OCD of having the same script in 10 different places in the repository.4
-
Does anyone work on a bunch of local NPM modules wanna describe their workflow for local dev vs deploy?
I’ve got mine but it feels a little trashy. It’s basically one npm script to link all the local modules for dev and another which will npm install them in prod - is there a better way without adding more build tools?1 -
Ops teams that are "assuring" stability of production by being the only ones who can click to deploy to Prod add no value.1
-
Need a workaround to start the application. Need to activate/deactivate a few workarounds at different points in the flow to test the code in development. Need a different workarounds to test the code in QA. Maybe another workaround for prod? Need a workaround to check in the code to CI/CD. Need a workaround to deploy the code.3
-
I'm always amazed at how people tend to prefer a certain pain instead of an uncertain relief... Batshit crazy...
One cousin talking about his abusive relationship: "Could be worst.... At least he doesn't beat the children"
A colleague talking about his failed marriage: "It's not so bad, we just have to avoid each other."
And you'll find the same shit with management. "The prod is only on D.O.S. twice a day...", " Every deploy is a hell as shit hit the fan, but after a week it's the usual"...
Crazy how "change" afraid people -
Wish me luck.
Deploying Blazor app for the first time in prod in about a month.
Did tests. Curent infrastructure can hold around 70 concurent sessions with no problems. (probably more, 70 was the limit in my browser).
I tested each sessiuon with 70.000 line table. (Yep whole 70.000 lines for each session with a virtual scroll).
Shit is fast. Too fast even. I'm waiting for other shoe to drop, but so far in simulated tests it's amazing.
Let's talk in 2 month AFTER prod deploy.4 -
Accidentally reverted the repository wiping all the non-versioned custom patches in a production client! I thought it was my computer’s terminal. Luckily we’d got a backup from last deploy. Later we customized the bash prompt in all prod servers to avoid confusions like this.1
-
// Rant 1
---
Im literally laughing and crying rn
I tried to deploy a backend on aws Fargate for the first time. Never used Fargate until now
After several days of brainwreck of trial and error
After Fucking around to find out
After Multiple failures to deploy the backend app on AWS Fargate
After Multiple times of deleting the whole infrastructure and redoing everything again
After trying to create the infrastructure through terraform, where 60% of it has worked but the remaining parts have failed
After then scraping off terraform and doing everything manually via AWS ui dashboard because im that much desperate now and just want to see my fucking backend work on aws and i dont care how it will be done anymore
I have finally deployed the backend, successfully
I am yet unsure of what the fuck is going on. I followed an article. Basically i deployed the backend using:
- RDS
- ECS
- ECR
- VPC
- ALB
You may wonder am i fucking retarded to fail this hard for just deploying a backend to aws?
No. Its much deeper than you think. I deployed it on a real world production ready app way.
- VPC with 2 public and 2 private subnets. Private subnets used only for RDS. Public for ALB.
- Everything is very well done and secure. 3 security groups: 1 for ALB (port 80), 1 for Fargate (port 8080, the one the backend is running on), 1 for RDS postgres (port 5432). Each one stacked on top and chained
- custom domain name + SSL certificate so i can have a clean version of the fully working backend such as https://api.shitstain.com
- custom ECS cluster
- custom target groups
- task definitions
Etc.
Right now im unsure how all of this is glued together. I have no idea why this works and why my backend is secure and reachable. Well i do know to some extent but not everything.
To know everything, I'll now ask some dumbass questions:
1. What is ECS used for?
2. What is a task definition and why do i need it?
3. What does Fargate do exactly? As far as i understood its a on-demand use of a backend. Almost like serverless backend? Like i get billed only when the backend is used by someone?
4. What is a target group and why do i need it?
5. Ive read somewhere theres a difference between using Fargate and... ECS (or is it something else)? Whats the difference?
Everything else i understand well enough.
In the meantime I'll now start analyzing researching and understanding deeply what happened here and why this works. I'll also turn all of this in terraform. I'll also build a custom gitlab CI/CD to automate all of this shit and deploy to fargate prod app
// Rant 2
---
Im pissing and shitting a lot today. I piss so much and i only drink coffee. But the bigger problem is i can barely manage to hold my piss. It feels like i need to piss asap or im gonna piss myself. I used to be able to easily hold it for hours now i can barely do it for seconds. While i was sleeping with my gf @retoor i woke up by pissing on myself on her bed right next to her! the heavy warmness of my piss woke me up. It was so embarrassing. But she was hardcore sleeping and didnt notice. I immediately got out of bed to take a shower like a walking dead. I thought i was dreaming. I was half conscious and could barely see only to find out it wasnt a dream and i really did piss on myself in her bed! What the fuck! Whats next, to uncontrollably shit on her bed while sleeping?! Hopefully i didnt get some infection. I feel healthy. But maybe all of this is one giant dream im having and all of u are not real9 -
Whats the fucking purpose of our companys dev test and prod env. Dev always only has a single instance. Sometimes clustered services run as cluster on test. Producing headaches because the clustering behaviour couldnt be seen on a single instance and Prod lacks all the nice deployment tools off dev/test. Fuck thinking you could dev then test and prod without any major reconfiguration and headaches. And all because the Storage costs is RETARDEDLY expensive because the backup EVERYTHING with ridiculess overkill. That results in headaches when requesting new servers. Took an old Workstation from the shelves and made it my vm slave so at least i could reliably deploy to test.. Fuck this process
-
Deploys to Production.
Runtime error.
Open Development server and run in Production setting.
Still runtime error.
Fixes Error.
Error fixed on development.
while (hoursWasted < 3) {
Deploy.
Not working on Prod.
Try other fix.
Still not working, but works perfectly in dev machine.
What the fuck
}
Rage
Go take a walk
Realized I might have deployed to the wrong server
Glanced at deployment path
Realized it's at the wrong server
Reconfigure and Deploy
It works.
Fuck.1 -
I just thought of an elegant solution to a problem in prod. Basically need to run 2 different versions of a app.
Just deploy it to uat and point it to prod db for the data.
I guess if we were following microservices architecture though we could deploy both versions to prod but we don't....3