Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "don't test in prod"
-
Someone called me an incapable, arrogant bitch today because I didn't let them test in prod. We had words.
The monologue from Full Metal Jacket went through my head at one point, and I almost didn't mute my phone in time before I said it rather than just thinking it. I'm not sure I would still have a job if I did, but I kind of wish I didn't mute my phone.13 -
Me passing time on the weekend
Random call from unknown number
Turns out it's the manager
M: hey , how is your weekend going ...
Me: nothing much ... Whatsup ?
M : yeah well , we wanted to push some minor adhoc fixes as some clients wanted it urgently
The Devops folks need developer support . Can you pitch in and monitor
Me : I'm not aware of what changes are going , i don't think i can provide support
M : don't worry it's minor changes , it's already tested in pre prod , you just need to be on call for 30 mins
Me : ugh okay .. guess 1 hr won't hurt
M: thanks 👍🏽
Me: *logs in
*Notices the last merged PR
+ 400 lines , implemented by junior dev and merged by manager
*Wait , how is this a *minor* release...
*Release got triggered already and the CI CD pipeline is in progress
*5 mins later
*Pipeline fails , devops sends email - test coverage below 50%
Manager immediately pitches in ...
M: hey , i see test coverage is down , can you increase it ?
Me: and how do u suppose I do that ?
M : well it's simple just write UTC for the missing lines ... Will it take time ?
Me : * ah shit here we go again
Yeah it will take time , there are around 400 lines , I am not aware of this component all together
Can you ask junior dev to pitch in and write the UTC for this
*Actually junior dev is out on a vacation with his girlfriend
M : well he's out for the weekend , but
as a senior dev , i expect you to have holistic understanding of the codebase and not give excuses ,
this is a priority fix which client are demanding we need this released ASAP
Me : * wait wat ?
---
I ended up being online for next 3 hours figuring out the code change and bumping up the UTC 🤦🏾9 -
Have a couple I want to air today.
First was at my first gig as a dev, 4-5 months out of school. I was the only dev at a startup where the owner was a computer illiterate psycopath with serious temper tantrums. We're talking slamming doors, shouting at you while you are on the phone with customers, the works...
Anyways, what happened was that we needed to do an update in our database to correct some data on a few order lines regarding a specific product. Guess who forgot the fucking where-clause... Did I mention this boss was a cheap ass, dollar stupid, penny wise asshole that refused to have anything but the cheapest hosting? No backups, no test/dev/staging environment, no local copies... Yeah, live devving in prod, fucking all customers with a missing semi-colon (or where clause).
Amazingly, his sheer incompetance saved my ass, because even if I explained it, he didn't get it, and just wanted it fixed as best we could.
The second time was at a different company where we were delivering managed network services for a few municipalities. I was working netops at that time, mostly Cisco branded stuff, from Voice-over-IP and wifi to switches and some routing.
One day I was rolling out a new wireless network, and had to add the VLAN to the core switch on the correct port. VLAN's, for those who don't know, are virtual networks you can use to run several separated networks on the same cable.
To add a VLAN on a Cisco switch one uses the command:
switchport access vlan add XYZ
My mistake was omitting the 'add', which Cisco switches happily accept without warning. That command however can be quite disruptive as it replaces all of the excisting VLAN's with the new one.
Not a big deal on a distribution switch supplying an office floor or something, but on a fucking core switch in the datacenter this meant 20K user had no internet, no access to the applications in the DS, no access to Active Directory etc. Oh and my remote access to that switch also went down the drain...
Luckily a colleague of mine was on site with a console cable and access to config backups. Shit was over within 15 minutes. My boss at that time was thankfully a pragmatic guy who just responded "Well, at least you won't make that mistake again" when we debriefed him after the dust settled. -
This was some time ago. A Legendary bug appeared. It worked in the dev environment, but not in the test and production environment.
It had been a week since I was working on the issue. I couldn't pinpoint the problem. We CANNOT change the code that was already there, so we needed to override the code that was written. As I was going at it, something happened.
---
Manager: "Hey, it's working now. What did you do?"
Me: *Very confused because I know I was nowhere close to finding the real source of the problem* Oh, it is? Let me check.
Also me: *Goes and check on the test and prod environment and indeed, it's already working*
Also me to the power of three: *Contemplates on life, the meaning of it, of why I am here, who's going to throw out the trash later, asking myself whether my buddies and I will be drinking tonight, only to realize that I am still on the phone with my manager*
Me again: "Oh wow, it's working."
Manager: "Great job. What were the changes in the code?"
Me: "All I did was put console logs and pushed the changes to test and prod if they were producing the same log results."
Manager: "So there were no changes whatsoever, is that what you mean?"
Me: "Yep. I've no idea why it just suddenly worked."
Manager: "Well, as long as it's working! Just remove those logs and deploy them again to the test and prod environment and add 'Test and prod fix' to the commit comment."
Me: "But what if the problem comes up again? I mean technically we haven't resolved the issue. The only change I made were like 20 lines of console logs! "
Manager: "It's working, isn't it? If it becomes a problem, we'll work it out later."
---
I did as I was told, and Lo and Behold, the problem never occurred again.
Was the system playing a joke on me? The system probably felt sorry for me and thought, "Look at this poor fucker, having such a hard time on a problem he can't even comprehend. That idiotic programmer had so many sleepless nights and yet still couldn't find the solution. Guess I gotta do my job and fix it for him. I'm the only one doing the work around here. Pathetic Homo sapiens!"
Don't get me wrong, I'm glad that it's over but..
What the fuck happened?5 -
TL;DR Dear boss, firstly, you always get someone to review anything important done by a fucking intern.
Secondly, you do not give access to your fucking client's production server to an intern.
Thirdly, you don't ask your fucking intern to test the intern's work that has not been reviewed by anyone directly on your client's fucking production server.
Last week, the boss and one of the lead devs (the only guy with some serious knowledge about systems and networking) decided to give me (an intern who barely has any work experience) the task of fixing or finding an alternate solution to allowing their support team access to their client machines. Currently they used a reverse SSH tunnel and an intermediary VH but for some reason, that was very unreliable in terms of availability. I suggested using OpenVPN and explained how it would work. Seemed to be a far better idea and they accepted. After several days of working through documentations and guides and everything, I figured out how OpenVPN works and managed to deploy a TEST server and successfully test remote access using two VMs. On seeing my tests, the boss told me that he wanted to test it on the client network. I agreed. Today he comes to me and he tells me to prepare testing for tomorrow and that the client technician is going to give me access to one of their boxes. And then he adds, "It's a working prod server. We'll see if we can make it work on that" and left. I gaped at him for a while and asked another dev guy in the room if what I heard was right. He confirmed. Turns out, the lead dev and the boss's son (who also works here) had had a huge argument since morning on the same issue and finally the dev guy had washed it off his hands and declared that if anything goes wrong from testing it on production, it's entirely the boss's own fault. That's when the boss stepped in and approached me. I ran back to his office and began to explain why prod servers don't top the list of things you can fuck around with. But he simply silenced me saying, "What can go wrong?" and added, "You shouldn't stay still. You should keep moving". Okay, like firstly what the fuck and secondly, what the fuck?.
Even though OpenVPN client is not the scariest thing to install, tomorrow's going to be fun.4 -
So last week I really fucked up
I had this new implementation that was supposedly to be integrating smoothly into the rest of the service. It depended on a serialized model made by a data scientist. I test it in local, in QA environment: no problem.
So, Friday, 4pm, I decide to deploy to production. I check once from the app: the service throw an error. Panic attack, my chief is at my desk, we triy to understand what went wrong. I make calls with cUrls: no problem. Everything seems fine. I recheck from the app again: no problem.
We dedice to let it in prod, as the feature work. I go get some beers with the guys, to celebrate the deploy.
Fast-forward the next morning, 11am, my phone ring: it's a colleague of my chief. "Please check Slack, a client is trying to use the feature, it's broken"
FUUUUUUUUUUUUCK!!!
Panic attack again. I go to the computer, check the errors: two types of errors. One I can fix, the other from a missing package on the machine that the data guy used.
Needless to say, I had a fairly good weekend.
Lessons learned:
- make sure Dev, QA and Prod are exactly the same (use Ansible or Container)
- never deploy on a Friday afternoon if you don't have a quick way to revert1 -
Dear IT troll: I am not some idiot user. I FUCKING WRITE SOFTWARE! I actually CREATE CAPABILITY! I don't create "content", I'm not some fucking suit that pumps out PowerPoint/Excel/Email all day long. I don't need to be handed a consumers screwdriver, hammer, and wrench set. I need to be able to set up the technological equivalent of MY OWN FUCKING FORGE AND ANVIL! Do you get it? Do you understand me? Give me administrator access and go the fuck away. While you're at it, please quarantine this pile of silicon onto a limited access network if it makes you feel better. My development system doesn't need to connect to the wealth of bullshit in your precious little dumbed down corporate Wiki-wannabe Sharepoint system. Keep my creative space away from Test and Prod networks while you're at it. Just give me the goddamed tools I need to do my work and fuck off!8
-
The panic when I overhear my colleague in a meeting "sure I can test it in prod but we normally don't do that..."5
-
I hate dev politics...
PM: Hey there is a weird error happening when I upload this file on production, but it works on our test environments.
Me: After looking at this error, I don't find any issues with the code, but this variable is set when the application is first loaded, I bet it wasn't loaded correctly our last deployment and we just need to reload the application.
Senior Dev: We need to output all of the errors and figure out where this error is coming from. Dump out all the errors on everything in production!!
Me: That's dumb... the code works on test... it's not the code.. it's the application.
Senior dev: %$*^$>&÷^> $
Me: Hey I have an idea! If test works... I can go ahead and deploy last week's changes to prod and dump those errors you were talking about!!
Senior Dev: OK
Me: *runs Jenkins job the deploys the new code and restarts the application*
PM: YAY you fixed it!!
Senior Dev: Did you sump put those errors like I said.
Me: Nope didn't touch a thing... I just deployed my irrelevant changes to that error and reloaded the application.2 -
Rolled out a new application I built almost entirely by myself 2 days ago... But my dev group is understaffed and has a project manager who is literally the most clueless person I have ever met, so as a result, we don't have a functional/useful dev/test/prod framework and no standards for how to deploy apps. So my past 2 days were comprised of fixing bugs in the live system that could probably have been caught if I had the time and resources to get everything thoroughly tested. It's stable now, but damn our management for being generally idiots. Our motto appears to be "Fuck it, we'll do it live"1
-
https://devops.com/dear-staging-wer...
What the F#*@&$# %#@$!?!?!
This person has decided to skip using staging, because it doesn't correctly reflect prod!
If that's your problem, than why don't you try to fix it? Create a DB with fake data, make one based on anonymised customer data, or even do it on non-anonymised data (with permission of course), but fix the staging env so that it reflects prod!
This is a devops site (it's literally the name!), and instead of teaching you how to make staging exactly like prod, they tell you to do what caused the creation of the staging->prod system IN THE FIRST PLACE!!!
There's all these stuff like Vagrant that are literally designed to help you as a dev mimic prod, and you just throw it all out!?
"With feature flags, I can safely test in production without fear of breaking something or negatively affecting the customer experience."
Famous last words.12 -
Tl;Dr Im the one of the few in my area that sees sftping as the prod service account shouldn't be a deployment process. And the ONLY ONE THAT CARES THAT THIS IS GONNA BREAK A BUNCH OF SHIT AT SOME POINT.
The non tl;dr:
For a whole year I've been trying to convince my area that sshing as the production service account is not the proper way to deploy and/or develop batch code. My area (my team and 3 sister teams) have no concept of using version control for our various Unix components (shell scripts and configuration files) that our CRITICAL for our teams ongoing success. Most develop in a "prodqa like" system and the remainder straight in production. Those that develop straight in prodqa have no "test" deployment so when they ssh files straight to actual production. Our area has no concept of continuous integration and automated build checking. There is no "test cases", no "systems testing" or "regression testing". No gate checks for changing production are enforced. There is a standing "approved" deployment process by the enterprise (my company is Whyyyyyyyyyy bigger than my area ) but no one uses it. In fact idk anyone in my area who knows HOW to deploy using the official deployment method. Yes, there is privileged access management on the service account. Yes the managers gets notified everytime someone accesses the privileged production account. The managers don't see fixing this as a priority. In fact I think I've only talk to ONE other person in my area who truly understands how terrible it is that we have full production change access on a daily basis. Ive brought this up so many times and so many times nothing has been done and I've tried to get it changed yet nothing has happened and I'm just SO FUCKING SICK that no one sees how big of a deal this. I mean, overall I live the area I work in, I love the people, yet this one glaring deficiency causes me so much fucking stress cause it's so fucking simple to fix.
We even have an newer enterprise deployment. Method leveraging a product called "urban code deploy" (ucd) to deploy a git repository. JUST FUCKING GIT WITH THE PROGRAM!!!!..... IT WAS RELEASED FUCKING 12 YEARS AGO......
Please..... Please..... I just want my otherwise normally awesome team to understand the importance and benefits of version control and approved/revertable deployments2 -
I'm still studying computer science/programming, I still have one year to do in order to graduate (Master). I am in a work study program so I'm working for a company half of the time, and I'm studying the other half. It is important to mention that I am the only web developer of the company
When I arrived in the company 9 months ago, I was given a Vue project which had been developed by a trainee a few weeks before my arrival and I was asked to correct a few things, it was mostly about css. Then, I was ask to add a few functionalities, nothing really hard to code, and we were supposed to test the solution in a staging environment, and if everything was ok, deploy it to prod.
However, the more I did what I was asked, the more functionalities I had to implement, until I reached a point where I had to modify the API, create new routes, etc. I'm not complaining about that, that's my job and I like it. But the solution was supposed to be ready when I arrived, it was also supposed to be tested and deployed.
The problem is, the person emitting these demands (let's call him guy X) is not from the IT service, it's a future user of the website in the admin side. The demands kept going and going and going because, according to him, the solution was not in a good enough state to be deployed, it missed too many (un)necessary features. It kept going for a few months.
The best is yet to come though : guy X was obviously a superior, and HIS superior started putting pressure on me through mails, saying the app was already supposed to be in production and he was implying that I wasn't working fast enough. Luckily, my IT supervisor was aware of what was going on and knew I obviously wasn't to blame.
In the end, the solution was eagerly deployed in production, didn't go through the staging environment and was opened to the users. Now, guy X receives complaints because none of what I did was tested (it was by me, but I wasn't going to test every single little thing because I didn't have time). Some users couldn't connect or use this or that feature and I am literally drowning in mails, all from guy X, asking me to correct things because users are blocked and it's time consuming for him to do some of the things the website was doing manually.
We are here now just because things have been done in a rush, I'm still working on it and trying to fix prod problems and it's pissing me off because we HAVE a staging environment that was supposed to prevent me from working against the clock.
On a final note, what's funny is that the code I'm modifying, the pre-existing one needs to be refactored because bits and pieces are repeated sometimes 5 times where it should have been externalized and imported from another file. But I don't know when and if I will ever be able to do that.
I could have given more context but it's 4am and I'm kinda tired, sorry if I'm not clear or anything. That's my first rant -
I've now worked on both monolithic solutions and microapps/microservices. I gotta say I'm not sold on the new approach. There's so much overhead! You don't have to know your way around one solution -- no, now you need to know your way around 100 solutions. Debugging? Yeah, good luck with that. You don't have to provision one environment for dev, test, staging, and prod. No, now you need 100 environments per... environment. Now, you need a dedicated fulltime devops person. Now devs can check in breaking changes because their code compiles fine in that one tiny microapp. The extra costs go on and on and on. I get the theoretical benefits but holy crap you pay for it dearly. Going back to monolithic is so satisfying. You just address the bug or new feature head on without the ceremony and complexity. You know you're not crapping on other people's day (compilation-wise) because the entire solution compiles.
...and yeah, I'm getting old. So get off the lawn! ;)2 -
I am forced to work with a client's notoriously slow SOAP api. Slow in this case is 1.5-2s per request.
The api is structured rather... creatively... at the same time. So we have to bombard it with thousands of requests to build our data base with historical SOAP data. Also the data sometimes is a couple of hours late, giving a flat line (all values at 0) until retroactively fixing the output for the same requests.
So to fill one dev data base with a year's worth of historical data (nice to have when testing a dashboard application) we hammer the api with ~20k requests (~1 million if we want to be thorough).
Best thing about that: There is no staging/test api and the prod api seems not to handle lots of requests at the same time very well...
Latest thought: Maybe we could put a varnish cache in front of the SOAP for testing. Better have wrong data, than nothing at all and we don't kill the prod clients every time we ramp up a new instance.
Also that would dramatically decrease the 4.2 hours of data pumping to about 7 minutes after the first run. -
While addressing a Senior Dev's (SD) query from another team.
SD: why is this field mandatory? Can't it be just optional? Any other work around?
Me: Is your code changes already pushed in Devo? In that case, we provide a value which will work since you are not concerned about it.
SD: Yes. It's pushed till production. And, I want to test changes in Prod.
Me: (shared some codes) and explained that this feature for testing is only available in Devo.
SD: I know that. (Shared me a ticket) I want this field to be optional. That's it.
Me: (read the entire ticket. Didn't find anything related their) Told him, I will discuss with team. And meanwhile, for Devo, you can use this value.
Next morning, I accidentally came over some other ticket raised by him only which had the correct doubts regarding request to support this field in production
Now, I don't know why did he share a wrong ticket with me.
And, how will it even help him if that field was even optional.
THAT JUST WONT WORK IN PRODUCTION.
I will discuss with my team and see what can happen. -
Long post, TLDR: Given a large team building large enterprise apps with many parts (mini-projects/processes), how do you reduce the bus-factor and the # of Brent's (Phoenix Project)?
# The detailed version #
We have a lot of people making changes, building in new processes to support new flows or changes in the requirements and data.
But we also have to support these except when it gets into Production there is little information to quickly understand:
- how it works
- what it does/supposed to do
- what the inputs and dependencies are
So often times, if there's an issue, I have to reverse engineer whatever logic I can find out of a huge mess.
I guess the saying goes: the only people that know how it works is whoever wrote it and God.
I'm a senior dev but i spend a lot of time digging thru source code and PROD issues to figure out why ... is broken and how to maybe fix it.
I think in Agile there's supposed to be artifacts during development but never seen em.
Personally whenever i work on a new project, I write down notes and create design diagrams so i can confirm things and have easy to use references while working.
I don't think anyone else does that. And afterwards, I don't have anywhere to put it/share it. There is no central repo for this stuff other than our Wiki but for the most part, is like a dumping ground. You have to dig for information and hoping there's something useful.
And when people leave, information is lost forever and well... we hire a lot of monkeys... so again I feel a lot of times i m trying to recover information from a corrupted hard drive...
The only way real information is transferred is thru word of mouth, special knowledge transfer sessions.
Ideally I would like anything that goes into PROD to have design docs as well as usage instructions in order for anyone to be able to quickly pick it up as needed but I'm not sure if that's realistic.
Even unit tests don't seem to help much as they just test specific functions but don't give much detail about how a whole process is supposed to work.9 -
- Change this, change that too, oh and that too.
* Ok, will do ( however unlikely it's going to be finished correctly, as you didn't consult me before or listen to me about the impact this might have )
- ( just stfu and do it )
Sometime at or after important cutoff:
- Hey this doesn't show up, is that right
* Yeah, you wanted too many things changed at once + fix it and notice a stupid error like an && switched with a ||, or other models that don't know about change x yet
/repeat3 -
Why do some employers make such a distinction between learning the tools at university and learning the same tools at the workplace?
Are they backward or old? Don't they know modern, high-quality universities have modern environments that are in fact real life?
Environments with acc-test-prod-dev with gitlab, ci/cd in Scrum teams and the works? Heck, at my uni we even worked at real companies, did internships there for months!
Come on.. to me this 'the tools you learned in school isn't the same experience as real life experience'. Right, these guys must be on some conservative backward model because there is in fact no difference.
I have worked both during my uni internship at a real company (in teams too) as well as irl at real companies and there is no difference, it's the same thing.
I don't care if I've learned to experience git + ReactJS etc during an internship through uni or at a workplace. It's all bureaucracy.10