Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "test in prod"
-
I worked with a good dev at one of my previous jobs, but one of his faults was that he was a bit scattered and would sometimes forget things.
The story goes that one day we had this massive bug on our web app and we had a large portion of our dev team trying to figure it out. We thought we narrowed down the issue to a very specific part of the code, but something weird happened. No matter how often we looked at the piece of code where we all knew the problem had to be, no one could see any problem with it. And there want anything close to explaining how we could be seeing the issue we were in production.
We spent hours going through this. It was driving everyone crazy. All of a sudden, my co-worker (one referenced above) gasps “oh shit.” And we’re all like, what’s up? He proceeds to tell us that he thinks he might have been testing a line of code on one of our prod servers and left it in there by accident and never committed it into the actual codebase. Just to explain this - we had a great deploy process at this company but every so often a dev would need to test something quickly on a prod machine so we’d allow it as long as they did it and removed it quickly. It was meant for being for a select few tasks that required a prod server and was just going to be a single line to test something. Bad practice, but was fine because everyone had been extremely careful with it.
Until this guy came along. After he said he thought he might have left a line change in the code on a prod server, we had to manually go in to 12 web servers and check. Eventually, we found the one that had the change and finally, the issue at hand made sense. We never thought for a second that the committed code in the git repo that we were looking at would be inaccurate.
Needless to say, he was never allowed to touch code on a prod server ever again.8 -
Before an interview prepare a list of questions for them, they expect it!
My list to give inspiration:
Describe your company culture? - if the response is buzzword heavy, avoid.
What’s the oldest technology still in use? - all companies have legacy systems but some are worse than others
Describe your agile process? - a few companies I’ve interviewed with said they are agile but it’s actually kanban
Are developers involved with customers?- if they trust you to talk to customers you can infer trust to do your job ( I’m sure others will disagree)
Describe your development environment?- do they have such a thing as dev, test and prod?
These are the only ones I can remember but should give others a bit of inspiration I hope 😄9 -
Someone called me an incapable, arrogant bitch today because I didn't let them test in prod. We had words.
The monologue from Full Metal Jacket went through my head at one point, and I almost didn't mute my phone in time before I said it rather than just thinking it. I'm not sure I would still have a job if I did, but I kind of wish I didn't mute my phone.13 -
Hello! I am Johnny Knoxville and tonight on Jackass...we test in prod with no back out plan in place...and then I will glue Bam's hand to his dick while he's sleeping.6
-
Week 278: Most rage-inducing work experience — I’ve got a list saved! At least from the current circle of hell. I might post a few more under this tag later…
TicketA: Do this in locations a-e.
TicketB: Do this in locations e-h.
TicketC: Do this in locations i-k.
Root: There’s actually a-x, but okay. They’re all done.
Product: You didn’t address location e in ticket B! We can’t trust you to do your tickets right. Did you even test this?
Root: Did you check TicketA? It’s in TicketA.
Product guy: It was called out in TicketB! How did you miss it?!
Product guy: (Refuses to respond or speak to me, quite literally ever again.)
Product guy to everyone in private: Don’t trust Root. Don’t give her any tickets.
Product manager to boss: Root doesn’t complete her tickets! We can’t trust her. Don’t give her our tickets.
Product manager to TC: We can’t trust Root. Don’t give her our tickets.
TC: Nobody can trust you! Not even the execs! You need to rebuild your reputation.
Root: Asks coworker a simple question.
Root: Asks again.
Root: nudges them.
Root: Asks again.
Coworker: I’ll respond before tomorrow. (And doesn’t.)
Root: Asks again.
Root: Fine. I’ll figure it out in my own.
TC: Stop making it sound like you don’t have any support from the team!
Root: Asks four people about <feature> they all built.
Everyone: idk
Root: Okay, I’ll figure it out on my own.
TC: Stop making it sound like you don’t have any support from the team!
Root: Mentions multiple meetings to discuss ticket with <Person>.
TC: You called <Person> stupid and useless in front of the whole team! Go apologize!
Root: Tells TC something. Asks a simple question.
Root: Tells TC the same thing. Asks again.
TC: (No response for days.)
TC: Tells me the exact same thing publicly like it’s a revelation and I’m stupid for not knowing.
TC: You don’t communicate well!
Root: Asks who the end user of my ticket is.
Root: Asks Boss.
Root: Asks TC.
Root: Fine, I’ll build it for both.
Root: Asks again in PR.
TC: Derides; doesn’t answer.
Root: Asks again, clearly, with explanation.
TC: Copypastes the derision, still doesn’t answer.
Root: Asks boss.
Boss: Doesn’t answer.
Boss: You need to work on your communication skills.
Root: Mentions asking question about blocker to <Person> and not hearing back. Mentions following up later.
<Person>: Gets offended. Refuses to respond for weeks thereafter.
Root: Hey boss, there’s a ticket for a minor prod issue. Is that higher priority than my current ticket?
Root: Hey, should I switch tickets?
Root: Hey?
Root: … Okay, I’ll just keep on my current one.
Boss: You need to work on your priorities.
Everyone: (Endless circlejerking and drama and tattling)6 -
Me passing time on the weekend
Random call from unknown number
Turns out it's the manager
M: hey , how is your weekend going ...
Me: nothing much ... Whatsup ?
M : yeah well , we wanted to push some minor adhoc fixes as some clients wanted it urgently
The Devops folks need developer support . Can you pitch in and monitor
Me : I'm not aware of what changes are going , i don't think i can provide support
M : don't worry it's minor changes , it's already tested in pre prod , you just need to be on call for 30 mins
Me : ugh okay .. guess 1 hr won't hurt
M: thanks 👍🏽
Me: *logs in
*Notices the last merged PR
+ 400 lines , implemented by junior dev and merged by manager
*Wait , how is this a *minor* release...
*Release got triggered already and the CI CD pipeline is in progress
*5 mins later
*Pipeline fails , devops sends email - test coverage below 50%
Manager immediately pitches in ...
M: hey , i see test coverage is down , can you increase it ?
Me: and how do u suppose I do that ?
M : well it's simple just write UTC for the missing lines ... Will it take time ?
Me : * ah shit here we go again
Yeah it will take time , there are around 400 lines , I am not aware of this component all together
Can you ask junior dev to pitch in and write the UTC for this
*Actually junior dev is out on a vacation with his girlfriend
M : well he's out for the weekend , but
as a senior dev , i expect you to have holistic understanding of the codebase and not give excuses ,
this is a priority fix which client are demanding we need this released ASAP
Me : * wait wat ?
---
I ended up being online for next 3 hours figuring out the code change and bumping up the UTC 🤦🏾9 -
That feeling when your client connection is more stable than the connection of a fucking game server... Incompetent pieces of shit!!! BEING ABLE TO PUT A COUPLE OF SPRITES DOESN'T MAKE YOU A FUCKING SYSADMIN!!!
Oh and I sent those very incompetent fucks a mail earlier, because my mailers are blocking their servers as per my mailers' security policy. A rant from the old box - their mail servers self-identify a fucking .local!!! Those incompetent shitheads didn't even properly change the values from test into those from prod!! So I sent them an email telling them exactly how they should fix it, as I am running the same MTA on my mailers (Postfix), at some point had to fix my mailers against the exact same issue as well, and clearly noticed in-game that they have deliverability problems (they explicitly mention to unblock their domain). Guess why?! Because their server's shitty configuration triggers fucking security mechanisms that are built against rogue mailers that attempt to spoof themselves as an internal mailer, with that fucking .local! And they STILL DIDN'T CHANGE IT!!!! Your fucking domain has no issues whatsoever, it's your goddamn fucking mail servers that YOU ASOBIMO FUCKERS SHOULD JUST FIX ALREADY!!! MOTHERFUCKERS!!!!!rant hire a fucking sysadmin already incompetent pieces of shit piece of shit game dev doesn't make you a sysadmin2 -
When migrating from MySQL server to MariaDB and having a query start returning a completely different result set then what was expected purely because MariaDB corrected a bug with sub selects being sorted.
It took several days to identify all that was needed on that sub select was “limit 1” to get that thing to return the correct data, felt like an idiot for only having to do 7 character commit 😆4 -
On the first day of Christmas, the bossman gave to me: The fact that my new computer purchase order needs to be OKed by the CEO and I need to continue working on a 2014 Mac Mini (i5-4260U, 8 Gig RAM, GPU shot by an ESD on the case long ago) for the next year.
On the second day of Christmas, my family gave to me... a good reason to get shitfaced
On the third day of Christmas, getting shitfaced gave to me: A hangover and some urgent plastic welding job that had to be done with a soldering iron. FML, I've had a headache before breathing in pure hydro-cyano-whatthefuckyougetwhenyoumeltplastics
On the fourth day of Christmas, my team gave to me: A legacy, age-old Rails 2 project that was written by an intern and never reviewed, went to prod in 2014 and can't be changed anymore, but needs to be changed after the fact that it has zero test coverage and needs 100 % now to prevent issues and costly manual testing.
On the fifth day of Christmas, devrant gave to me: The Idea that making fun of Christmas songs to get over the sheer amount of dicks that working over the twelve days of Christmas sucks.
To be continued...2 -
TLDR - you shouldn't expect common sense from idiots who have access to databases.
I joined a startup recently. I know startups are not known for their stable architecture, but this was next level stuff.
There is one prod mongodb server.
The db has 300 collections.
200 of those 300 collections are backups/test collections.
25 collections are used to store LOGS!! They decided to store millions of logs in a nosql db because setting up a mysql server requires effort, why do that when you've already set up mongodb. Lol 😂
Each field is indexed separately in the log.
1 collection is of 2 tb and has more than 1 billion records.
Out of the 1 billion records, 1 million records are required, the rest are obsolete. Each field has an index. Apparently the asshole DBA never knew there's something called capped collection or partial indexes.
Trying to get approval to clean up the db since 3 months, but fucking bureaucracy. Extremely high server costs plus every week the db goes down since some idiot runs a query on this mammoth collection. There's one single set of credentials for everything. Everyone from applications to interns use the same creds.
And the asshole DBA left, making me in charge of handling this shit now. I am trying to fix this but am stuck to get approval from business management. Devs like these make me feel sad that they have zero respect for their work and inability to listen to people trying to improve the system.
Going to leave this place really soon. No point in working somewhere where you are expected to show up for 8 hours, irrespective of whether you even switch on your laptop.
Wish me luck folks.3 -
Highlights from my week:
Prod access: Needed it for my last four tickets; just got it approved this week. No longer need it (urgently, anyway). During setup, sysops didn’t sync accounts, and didn’t know how. Left me to figure out the urls on my own. MFA not working.
Work phone: Discovered its MFA is tied to another coworker’s prod credentials. Security just made it work for both instead of fixing it.
My merchant communication ticket: I discovered sysops typo’d my cronjob so my feature hasn’t run since its release, and therefore never alerted merchants. They didn’t want to fix it outside of a standard release. Some yelling convinced them to do it anyway.
AWS ticket: wow I seriously don’t give a crap. Most boring ticket I have ever worked on. Also, the AWS guy said the project might not even be possible, so. Weee, great use of my time.
“Tiny, easy-peasy ticket”: Sounds easy (change a link based on record type). Impossible to test locally, or even view; requires environments I can’t access or deploy to. Specs don’t cover the record type, nor support creating them. Found and patched it anyway.
Completed work: Four of my tickets (two high-priority) have been sitting in code review for over a month now.
Prod release: Release team #2 didn’t release and didn’t bother telling anyone; Release team #1 tried releasing tickets that relied upon it. Good times were had.
QA: Begs for service status page; VP of engineering scoffs at it and says its practically impossible to build. I volunteered. QA cheered; VP ignored me.
Retro: Oops! Scrum master didn’t show up.
Coworker demo: dogshit code that works 1 out of 15 times; didn’t consider UX or user preferences. Today is code-freeze too, so it’s getting released like this. (Feature is using an AI service to rearrange menu options by usage and time of day…)
Micromanager response: “The UX doesn’t matter; our consumers want AI-driven models, and we can say we have delivered on that. It works, and that’s what matters. Good job on delivering!”
Yep.
So, how’s your week going?2 -
I really, really, fucking god damn it REALLY need to move a legacy project from the grave yard server and get it in git, and then build a dev environment for it, so I can stop making incredibly volatile changes direct to PROD (backend, frontend and DB all at once and then test it while it’s live and being used, but fuck me if I can be bothered digging through a 10GB code base and attempting to make it work in a multi-environment setup when it’s going to be a long trip down the error logs until it works again 😱🔫2
-
Long time ago, back in a day of Microsoft Office 95 and 97, I was contracted to integrate a simple API for a payment service provider.
They've sent me the spec, I read it, it was simple enough: 1. payment OK, 2. payment FAILED. Few hours later the test environment was up and happy crediting and debiting fake accounts. Then came the push to prod.
I worked with two other guys, we shut down the servers, made a backup, connected new provider. All looked perfectly fine. First customers were paying, first shops were sending their products... Until two days later it turned out the money isn't coming through even though all we are getting from the API is "1" after "1"! I shut it off. We had 7 conference calls, 2 meetings, 3 days of trying and failing. Finally, by a mere luck, I found out what's what.
You see, Microsoft, when you invent your own file format, it's really nice to make it consistent between versions... So that the punctuation made in Microsoft Word 97 that was supposed to start from "0" didn't start from "1" when you open the file in Microsoft Word 95.
Also, if you're a moron who edits documentation in Microsoft Word, at least export it to a fucking PDF before sending out. Please. -
When will I fuckin learn that
a) customers lie
b) customers are sloppy
c) customers are wrong
d) customers do not do their work (properly)
e) customers want us to do their (dirty) work
f) possibly all of the freakinly above?! + khm....
They will fuckin aaaalwaaaays say sth is not working after the update..
And I will alwaaaays assume I fucked up something..even if I didn't touch that part of the code/data..
And almost aaaaalways it turns out that the bug they complain about is how the system worked (or didn't work) before the update and/or some fuckup from their side..
Anyhow, I rushed over, grabbed the files went testing in dev..wtf, output is different, mine is ok, theirs is..wtf is that shit?!
Transfer newly built dll to test..same shit as on prod..wtf?! How?!
I assumed they have thing A correctly linked to thing B.. ofc thing A was linked to thing C in their case and in another case (our test) to correct thing B..
I got chillies when grabbing files, that
I should have tripple checked that they didn't fuck up something on the link part, but I just assumed they know what they were doing & that they checked they linked correct files with correct content already, before being pissy that the update fucked up things.. riiiight!! :/
I wanted to find solutions to this fuckup asap so I disregarded my gut feeling..yet again!! Fuuuck!
I've spent too much time trying to find ways to fix a bug that wasn't even a real bug to begin with.. :/
Fuuuuuck!!
So yeah, always treat the customers like they are 3yrs old & have no clue what they are doing & check exactly wtf they were indeed trying to do..it will save you time & nerves..
And note to self: reread this shit daily!! And imprint it in your brain that everything is not always your fault!!11 -
Half way through a 2 hour deployment, and a fucking test fails.
Yes, it takes 2+hours to run the tests.
Rerun in test environment: pass
Rerun in prod: pass
Rerun changed test in prod: fail..
Why, why you got to hate me for?
I love it when production is the place where config gets changed.rant tests are good multi environment changes tests can go fuck off right now tests save lives inconsistencies hurt5 -
Typos kill, kids! And deploying to production.
Instead of "for item in items" in my script, I accidentally did "for items in items". Thus, an exponential loop has been entering things into the database for the past few hours before I found the place to fix it.
By the way, this runs on cron every minute. So there are processes still running exponentially right now, possibly 180+.
Yeah, I'm setting up a a test server instead now.11 -
There is a company providing a very speciffic service. And it has a core application for that svc, supported by a core app team.
That company also has other services, which are derivations of the core one. So every svc depends on core.
Now that we're clear on that... I was working in a team of one of the subservices. We very strongly depended on core. In fact, our svc was useless if integration w/ core broke down.
The core team had an annoying habbit. They refused to version their webservices and they LOVED to push api updates w/o any warnings. Our prod, test, other envs used to fail bcz of core api changes quite often. Mgmt, IT head was aware of the problem and customers' complaints as well.
So as a result, once core api changes we're all in a panic mode: all prior priorities are lowered and revival of prod is to be our main focus. Core api is not docummented, the changes are not clear, so we have to reverse engineer the shit out of it. We manage to patch our prod up w/ hotfixes, but now we have tech debt. While working on the debt, core api changed again, in test env. Mgmt pushes debt back and reallocates us to hotfix test. Hotfix is 80% done when another core api breaks. Now mgmt asks us to drop wtv we're working on and fix that new break. By the time we're to deploy the hotfix, another api breaks in another env. The mgmt..... You get the picture :)
2 years go by, nothing has changed so far.6 -
I've been staffed on a old ongoing project, first day.
0. Compatibility has to be guaranteed down till IE9... ppf.
1. Front end made in XHTML+JS(jQuery)... bah, ok.
2. XHTML+JS is actually generated by PHP5.4, not a line is actually statically served... beh, funny, ok.
3. PHP files are the output of an XSLT transform of a bunch of XMLs... meh, seriously? Oooook.
4. XMLs are the product of the serialisation of a truck of stateful JavaEE6 DTOs populated magically (undocumented) with data coming from a SQL DB... WTF mode!!!
5. Session logics lives within PHP-land at point 2, front end makes ajax calls here that propagates to another WS out of our control that triggers -somehow- (undocumented) our Java backend at point 4 to generate new XMLs and then reach front end again. Kill me now.
Boss: look... it's too slow for the client, it's too heavy on our servers: fix it. Ah, and we sold 85% test coverage by October. You're the man for the job. (I'm a Node.js fullstacker and right now there's not even a testing scaffold, ofc).
Me: prod is on Linux or Windows?
Boss: RHEL7.
Me: rm -rf / as root. Done.
Boss: I know I know...
Me: ...
I think time has come...6 -
Have a couple I want to air today.
First was at my first gig as a dev, 4-5 months out of school. I was the only dev at a startup where the owner was a computer illiterate psycopath with serious temper tantrums. We're talking slamming doors, shouting at you while you are on the phone with customers, the works...
Anyways, what happened was that we needed to do an update in our database to correct some data on a few order lines regarding a specific product. Guess who forgot the fucking where-clause... Did I mention this boss was a cheap ass, dollar stupid, penny wise asshole that refused to have anything but the cheapest hosting? No backups, no test/dev/staging environment, no local copies... Yeah, live devving in prod, fucking all customers with a missing semi-colon (or where clause).
Amazingly, his sheer incompetance saved my ass, because even if I explained it, he didn't get it, and just wanted it fixed as best we could.
The second time was at a different company where we were delivering managed network services for a few municipalities. I was working netops at that time, mostly Cisco branded stuff, from Voice-over-IP and wifi to switches and some routing.
One day I was rolling out a new wireless network, and had to add the VLAN to the core switch on the correct port. VLAN's, for those who don't know, are virtual networks you can use to run several separated networks on the same cable.
To add a VLAN on a Cisco switch one uses the command:
switchport access vlan add XYZ
My mistake was omitting the 'add', which Cisco switches happily accept without warning. That command however can be quite disruptive as it replaces all of the excisting VLAN's with the new one.
Not a big deal on a distribution switch supplying an office floor or something, but on a fucking core switch in the datacenter this meant 20K user had no internet, no access to the applications in the DS, no access to Active Directory etc. Oh and my remote access to that switch also went down the drain...
Luckily a colleague of mine was on site with a console cable and access to config backups. Shit was over within 15 minutes. My boss at that time was thankfully a pragmatic guy who just responded "Well, at least you won't make that mistake again" when we debriefed him after the dust settled. -
I used to work on a production management team, whose job was, among other things, safeguarding access to production. Dev teams would send us requests all the time to, "run a quick SQL script."
Invariably, the SQL would include, "SELECT * FROM db_config."
We would push the tickets back, and the devs would call us, enraged. I learned pretty quickly that they didn't have any real interest in dev, test, or staging environments, and just wanted to do everything in prod, and see if it works.
But they would give up their protests pretty fast when I offered to let them speak to a manager when they were upset I wouldn't run their SQL.2 -
I once ran a batch import job to stress test with much more data than usual in staging... so I thought.
I sshd into prod by accident. History search can be a bastard at times...
The moment I realized what I was doing was crazy. I was shivering and thought I get fired if anything went wrong.
fortunately I just duplicated the original data for the test, and the system was built to ignore unnecessary updates... so the data was correct and nothing went wrong.
Not an active stupid dev choice but still something I will remember for a while. -
This was some time ago. A Legendary bug appeared. It worked in the dev environment, but not in the test and production environment.
It had been a week since I was working on the issue. I couldn't pinpoint the problem. We CANNOT change the code that was already there, so we needed to override the code that was written. As I was going at it, something happened.
---
Manager: "Hey, it's working now. What did you do?"
Me: *Very confused because I know I was nowhere close to finding the real source of the problem* Oh, it is? Let me check.
Also me: *Goes and check on the test and prod environment and indeed, it's already working*
Also me to the power of three: *Contemplates on life, the meaning of it, of why I am here, who's going to throw out the trash later, asking myself whether my buddies and I will be drinking tonight, only to realize that I am still on the phone with my manager*
Me again: "Oh wow, it's working."
Manager: "Great job. What were the changes in the code?"
Me: "All I did was put console logs and pushed the changes to test and prod if they were producing the same log results."
Manager: "So there were no changes whatsoever, is that what you mean?"
Me: "Yep. I've no idea why it just suddenly worked."
Manager: "Well, as long as it's working! Just remove those logs and deploy them again to the test and prod environment and add 'Test and prod fix' to the commit comment."
Me: "But what if the problem comes up again? I mean technically we haven't resolved the issue. The only change I made were like 20 lines of console logs! "
Manager: "It's working, isn't it? If it becomes a problem, we'll work it out later."
---
I did as I was told, and Lo and Behold, the problem never occurred again.
Was the system playing a joke on me? The system probably felt sorry for me and thought, "Look at this poor fucker, having such a hard time on a problem he can't even comprehend. That idiotic programmer had so many sleepless nights and yet still couldn't find the solution. Guess I gotta do my job and fix it for him. I'm the only one doing the work around here. Pathetic Homo sapiens!"
Don't get me wrong, I'm glad that it's over but..
What the fuck happened?5 -
So... Heard back from a recruiter today. Lovely lass.
I’d passed over a submission for her tech demo.
The brief was basically just to create a small simple module that calculates shit, nae effort.
But, when the recruiter had me on the phone she said “I know it’s a silly small module but try and run it up like you would a production ready app”.
The job spec and recruiter were keen on me demonstrating TDD, not specific on js version, final runtime, etc. The job was a senior spec at a higher salary range. So it warranted some effort, and demonstrating more than a simple module.
“Okay, cool, nae bother, let’s crack on.”
The feedback in the response from the dev today:
“He’s over-engineered tests, build...”
SUCK MY LEFT TESTICLE YOU FUCKWIT.
Talk to your recruiters, not me.
The feedback included a phrase I never hope to hear from a developer I work with:
“Tests are good but...” 😞
It was a standard 98% test suite from an RGR cycle, no more or less than I’d expect in prod.
The rest of the feedback was misguided or plain wrong. It was useful to see because I know now when they say they have “high standards” they mean: we listen to the dude who put the factory pattern in a JS brief.
Oh shit also: “someone’s done chmod 777” was in there as a sarcastic comment in the feedback. It was his fucking unarchive tool 😞
My response was brief and polite: “cheers for the consideration, all the best, James”
It’s honestly not worth warning them. Or, asking why they’d criticise something they’d asked me to do.
If you want a shitty js module, ask for a shitty js module and no more.4 -
So after 6 months of asking for production API token we've finally received it. It got physically delivered by a courier, passed as a text file on a CD. We didn't have a CD drive. Now we do. Because security. Only it turned out to be encrypted with our old public key so they had to redo the whole process. With our current public key. That they couldn't just download, because security, and demanded it to be passed in the fucking same way first. Luckily our hardware guy anticipated this and the CD drives he got can burn as well. So another two weeks passed and finally we got a visit from the courier again. But wait! The file was signed by two people and the signatures weren't trusted, both fingerprints I had to verify by phone, because security, and one of them was on vacation... until today when they finally called back and I could overwrite that fucking token and push to staging environment before the final push to prod.
Only for some reason I couldn't commit. Because the production token was exactly the same as the fucking test token so there was *nothing to commit!*
BECAUSE FUCKING SECURITY!5 -
I made a Product in my free time (after work hours)
it's a SAAS thats supposed to be an add-on to apps and websites
Added it to my own apps (what better than Test in prod) and over months fixed its pitfalls n ngl, even im impressed by its core tech and resilience
But thats kinda it -.- Ik I should make a landing page and launch it etc but I lost the will the day the "core tech" was 99% perfected
Im a Product guy not a businessman T__T
It's the weirdest mental block ive had in a while ffs.8 -
Developing a notification API, sends emails to subscribers, email API can take only 100 IDs at once, so partitioned the email list and send mails in blocks of 100.
Forgot to reset the list after every block, so each new partition got appended to the existing list and kept going on.
Ran it against a test DB, which was recently refreshed with near-prod data !!! Thousands of emails went out of the app server in one shot and everybody receiving numerous duplicate emails. Especially the ones in the very first partition.
Got an incident raised by the CEO himself reg the flurry of emails. But, things were out of our hands, quite literally. All emails are queued up in the exchange server.
Called up the exchange server team, purged the queued emails. No other emails were sent/received during this whole episode.
Thanks to Iterables.partition in the present day.3 -
ughh, studying about various software project management methodologies and lifecycles is so boring.
every different model looks similar, saying :
you got an idea?
- check weather its needed and if its practically / financially possible
- get investment and resources,
- design,develop, test, release
- repeat .
why name them waterfall or spiral or rad or agile or shit?
and we know how project go in reality: "fuck its 2 days to release and 5 features left? push to prod, make breaking features, leave the tests and release"7 -
So last week I really fucked up
I had this new implementation that was supposedly to be integrating smoothly into the rest of the service. It depended on a serialized model made by a data scientist. I test it in local, in QA environment: no problem.
So, Friday, 4pm, I decide to deploy to production. I check once from the app: the service throw an error. Panic attack, my chief is at my desk, we triy to understand what went wrong. I make calls with cUrls: no problem. Everything seems fine. I recheck from the app again: no problem.
We dedice to let it in prod, as the feature work. I go get some beers with the guys, to celebrate the deploy.
Fast-forward the next morning, 11am, my phone ring: it's a colleague of my chief. "Please check Slack, a client is trying to use the feature, it's broken"
FUUUUUUUUUUUUCK!!!
Panic attack again. I go to the computer, check the errors: two types of errors. One I can fix, the other from a missing package on the machine that the data guy used.
Needless to say, I had a fairly good weekend.
Lessons learned:
- make sure Dev, QA and Prod are exactly the same (use Ansible or Container)
- never deploy on a Friday afternoon if you don't have a quick way to revert1 -
TL;DR Dear boss, firstly, you always get someone to review anything important done by a fucking intern.
Secondly, you do not give access to your fucking client's production server to an intern.
Thirdly, you don't ask your fucking intern to test the intern's work that has not been reviewed by anyone directly on your client's fucking production server.
Last week, the boss and one of the lead devs (the only guy with some serious knowledge about systems and networking) decided to give me (an intern who barely has any work experience) the task of fixing or finding an alternate solution to allowing their support team access to their client machines. Currently they used a reverse SSH tunnel and an intermediary VH but for some reason, that was very unreliable in terms of availability. I suggested using OpenVPN and explained how it would work. Seemed to be a far better idea and they accepted. After several days of working through documentations and guides and everything, I figured out how OpenVPN works and managed to deploy a TEST server and successfully test remote access using two VMs. On seeing my tests, the boss told me that he wanted to test it on the client network. I agreed. Today he comes to me and he tells me to prepare testing for tomorrow and that the client technician is going to give me access to one of their boxes. And then he adds, "It's a working prod server. We'll see if we can make it work on that" and left. I gaped at him for a while and asked another dev guy in the room if what I heard was right. He confirmed. Turns out, the lead dev and the boss's son (who also works here) had had a huge argument since morning on the same issue and finally the dev guy had washed it off his hands and declared that if anything goes wrong from testing it on production, it's entirely the boss's own fault. That's when the boss stepped in and approached me. I ran back to his office and began to explain why prod servers don't top the list of things you can fuck around with. But he simply silenced me saying, "What can go wrong?" and added, "You shouldn't stay still. You should keep moving". Okay, like firstly what the fuck and secondly, what the fuck?.
Even though OpenVPN client is not the scariest thing to install, tomorrow's going to be fun.4 -
I didn’t turn down a dev freelance project when the client decided against going with best practices because the solution I offered was a well-established design pattern but created a need for a financial management change she didn’t like. I stupidly built what she asked for. It worked fine in the 3rd party vendor test environment but failed on production. After hours of analysis of code to ensure no changes happened to my source during test->prod deployment, and the vendor denying they had config differences between them, and the client refusing to pay, all I could do was abandon the project.2
-
Dear IT troll: I am not some idiot user. I FUCKING WRITE SOFTWARE! I actually CREATE CAPABILITY! I don't create "content", I'm not some fucking suit that pumps out PowerPoint/Excel/Email all day long. I don't need to be handed a consumers screwdriver, hammer, and wrench set. I need to be able to set up the technological equivalent of MY OWN FUCKING FORGE AND ANVIL! Do you get it? Do you understand me? Give me administrator access and go the fuck away. While you're at it, please quarantine this pile of silicon onto a limited access network if it makes you feel better. My development system doesn't need to connect to the wealth of bullshit in your precious little dumbed down corporate Wiki-wannabe Sharepoint system. Keep my creative space away from Test and Prod networks while you're at it. Just give me the goddamed tools I need to do my work and fuck off!8
-
Just needing somewhere to let some steam off
Tl;dr: perfectly fine commandline system is replaced by bad ui system because it has a ui.
For a while now we have had a development k8s cluster for the dev team. Using helm as composing framework everything worked perfectly via the console. Being able to quickly test new code to existing apps, and even deploy new (and even third party apps) on a simar-to-production system was a breeze.
Introducing Rancher
We are now required to commit every helm configuration change to a git repository and merge to master (master is used on dev and prod) before even being able to test the the configuration change, as the package is not created until after the merge is completed.
Rolling out new tags now also requires a VCS change as you have to point to the docker image version within a file.
As we now have this awesome new system, the ops didn't see a reason to give us access to kubectl. So the dev team is stuck with a ui, but this should give the dev team more flexibility and independence, and more people from the team can roll releases.
Back to reality: since the new system we have hogged more time from ops than we have done in a while, everyone needs to learn a new unintuitive tool, and the funny thing, only a few people can actually accept VCS changes as it impacts dev and prod. So the entire reason this was done, so it is reachable to more people, is out the window.3 -
Departure: 13.00 Train: test Destination: A1 Delay 18888min
Now it is 15:43 and it is still on...
Finaly poland joins the testing in the prod club!
Station is Wroclaw if anybody is interested.2 -
Rolled out a new application I built almost entirely by myself 2 days ago... But my dev group is understaffed and has a project manager who is literally the most clueless person I have ever met, so as a result, we don't have a functional/useful dev/test/prod framework and no standards for how to deploy apps. So my past 2 days were comprised of fixing bugs in the live system that could probably have been caught if I had the time and resources to get everything thoroughly tested. It's stable now, but damn our management for being generally idiots. Our motto appears to be "Fuck it, we'll do it live"1
-
TL;DR: When picking vendors to outsource work to, vet them really well.
Backstory:
Got a large redesign project that involves rebuilding a website's main navigation (accessibility reasons).
Project is too big just for our dev team to handle with our workload so we got to bring a 3rd party vendor to help us. We do this often so no big deal.
But, this time the twist was Senior Management already had retained hours with a dev shop so they want us to use them for project. Okay...
It begins:
Have our scope / discovery meeting about the changes and our expected DevOps workflow.
Devs work Local and push changes to our Github, that kicks off the build and we test on Dev, then it goes to Staging for more testing & PM review. Once ready we can push to prod, or whenever needed. All is agreed, everyone was happy.
Emailed the vendors' project manager to ask for their devs Github accounts so we can add them to the project. Got no reply for 3 days.
4th day, I get back "Who sets up the Github accounts?"
fuck me. they've never used Github before but in our scope meeting 4 days ago you said Github was fine...??
Whatever, fuck it. I'll make the accounts and add them.
Added 4 devs to the repo and setup new branch. 40min later get an email that they can't setup dev environment now, the dev doesn't know how to setup our CMS locally, "not working for some reason."
So, they ask for permission to develop on our STAGING server.. "because it's already setup"... they want to actively dev on our staging where we get PM/Senior Management approvals?
We have dev, staging, production instances and you want to dev in staging, not dev?... nay nay good sir.
This is whom senior management wants us to use, already paid for via retainer no less. They are a major dev shop and they're useless...
😢😭
Cant wait for today's progress checkup meeting. 😐😐
/rant1 -
A couple of years ago I was working on a fairly large system with a complex (by necessity) access control architecture.
As is usually the case with those projects, it's awkward for developers to repro bugs that have to do with a user's accesses in production when we are not allowed to replicate production data in test, let alone locally.
We had a bug where I ended up making myself a new row in the production database for a thing I could have access to without affecting real data to repro it safely. I identified the bug so I could repro it in dev/test and removed the row and ensured everything worked normally, whew scary.
Have you ever walked into the office one day, and everyone is hunched over in a semicircle around one person's workstation, before one turns around to look at you and says - after a pause - "... ltlian?.."
Turns out I had basically "poisoned the well" with my dummy entity in a way where production now threw 500 for everyone BUT me who had transitive access to this post-non-entity. Due to the scope of the system, it had taken about a day for this to gradually propagate in terms of caching and eventual consistencies; new entities coming in was expected, but not that they disappear.
Luckily I had a decent track record for this to be a one-off. I sometimes think about how I would explain testing in prod and making it faceplant before going home for the day, other than "I assumed it would be fine". I would fire me.3 -
The panic when I overhear my colleague in a meeting "sure I can test it in prod but we normally don't do that..."5
-
ideal sprint fallacy.
total days 10 , total hours(excluding breaks ) 8 hrs per day= 80 hrs per dev
code freeze day = day 8, testing+ fixing days : 8,9,10. release day : day 10
so ideal dev time = 7days/56 hr
meetings= - 1hr per day => 49 hrs per dev
- 1 day for planning i.e d1 . so dev time left . 6 days 42 hrs.
-----------
all good planning. now here comes the messups
1. last release took some time. so planning could not happen on d1. all devs are waiting. . devtime = 5 days 35 hrs.
2. during planning:
mgr: hey devx what's the status on task 1?
d: i integrated mock apis. if server has made the apis, i will test them .
mgr : server says the apis are done. whats your guestimate for the task completion?
d : max 1-2 hrs?
m : cool. i assign you 4 hrs for this. now what about task 2?
d : task told to me is done and working . however sub mgr mentioned that a new screen will be added. so that will take time
m : no we probably won't be taking the screen. what's your giestimate?
d : a few more testing on existing features. maybe 1-2 hrs ?
m: cool
another 4 hrs for u. what about task 3?
d : <same story>
m : cool. another 4 hrs for u. so a total of 12 hrs out of 35 hrs? you must be relaxed this sprint.
d : yeah i guess.
m cool.
-------
timelines.
d1: wasted i previous sprint
d2 : sprint planning
d3 : 3+ hrs of meetings, apis for task 1 weren't available sub manager randomly decided that yes we can add another screen but didn't discussed. updates on all 3 tasks : no change in status
d4 : same story. dev apis starts failing so testing comes to halt.
d5 : apis for task1 available . task 3 got additional improvement points from mgr out of random. some prod issue happens which takes 4+ hrs. update on tasks : some more work done on task 3, task 1 and 2 remains same.
d6 : task1 apis are different from mocks. additionally 2 apis start breaking and its come to know thatgrs did not explain the task properly. finally after another 3+ hrs of discussion , we come to some conclusions and resolutions
d7 : prod issue again comes. 4+ hrs goes into it . task 2 and 3 are discussed for new screen additiona that can easily take 2+ days to be created . we agree tot ake 1 and drop 2nd task's changes i finish task 2 new screens in 6 hrs , hoping that finally everything will be fine.
d8 : prod issue again comes, and changes are requested in task 2 and 3
day 9 build finally goes to tester
day 10 first few bugs come with approval for some tasks
day 11(day 1 of new sprint) final build with fixes is shared. new bugs (unrelated to tasks. basically new features disguised as bugs) are raised . we reject and release the build.
day 2 sprint planning
mgr : hey dev x, u had only 12 hrs of work in your plate. why did the build got delayed?
🥲🫡5 -
I hate having to deal with our IT service desk. Every time it takes enormous energy to get to the right people and make them understand that no, you are not an idiot, but you actually have a technical issue.
Sure thing they do have a few competent nice folks there too I've gotten to know over time and they indeed have to deal with a ton of dumb non-tech savvy idiots on a daily basis. However, if my job title mentions "software" and "engineer" they should at least assume I'm an idiot in tech. Or something. Every single time I need to open a ticket, even for the simplest "add x to env y", I need to quadruple check that the subject line is moron-friendly because otherwise they would take every chance to respond "nah we can't do that", "that's not us", or "sry that's not allowed". And then I would need to respond, "yes you do:) your slightly more competent colleague just did this for us 2 weeks ago".
Now you might imagine this is on even another level when the problem is complex.
One of our internal apps has been failing because one of the internal APIs managed by a service desk team responds a 500 status code randomly but only when called with a specific internal account managed by another service desk team.
(when I say "managed by", that doesn't mean they maintain it, it just mean they are the only ones who would have access to change something)
Yesterday I spent over a fucking hour writing a super precise essay detailing the issue, proving a million times it's not on our end and that they need to fix it. Now here is an insight to what beautiful "IT service" our service desk provides:
1) ticket gets assigned to a "Connectivity Engineer" lady
2) few hours later she responds and asks me to give her the app and environment IDs and grant her access to those
(naturally everything in my email was ignored including these two IDs)
3) since the app needs to be in prod for the issue, I make a copy isolating the failing part and grant her access to the original "for reference" and the copy to play with
4) few hours later I get an email from the env that some guy called P made changes to the actual app, no changes to the copy
(maybe they immediately fixed the app even though I asked them to only touch the copy)
I also check the env and the live app had been shared with another 2 people giving them editing rights:)
5) another few hours pass and the lady responds that she had been chatting with P (no mention of who tf that guy is) and that P has a suggestion that might work and I should test it, "please see screen shot" for details:
These motherfuckers sent me a fucking screenshot of the env config file where "P has edited a few parameters" that might help. The screenshot had a 16 line part of the config json with a bunch of IDs and Base64 params which HE EDITED LOCALLY.
Again, because I needed a few iterations to realise what I've just witnessed:
These idiots modified some things in the main app (not the copy) for hours. Then came to the conclusion that the config needs some IDs and params updated. They downloaded the config json. Edited it locally. Did not fucking upload it back to the main or test app. Did not test it live. Did not CC in or direct the guy with changes to me. Did not send me the modified config file. Did not even paste the new IDs into the email. But TOOK A FUCKING SCREENSHOT OF THE MODIFIED FILE AND SENT THAT SHIT TO ME. And then had the audacity to ask me to test it when they had access to it and that's literally their fucking job.
I had to compare the fucking screenshot to the live config file and manually type in the changes.
And no, it still doesn't work. And Now I have to get back to them showing it still fails the same way but I just can't deal with these people. Fuck. Was hoping by the time I write it all down it'd be better, and it does feel a bit better, but I still need to get this app fixed. And I can only do it through these... monkeys. I just can't. Talking to these people drains my life energy... I'm just sad. -
I hate dev politics...
PM: Hey there is a weird error happening when I upload this file on production, but it works on our test environments.
Me: After looking at this error, I don't find any issues with the code, but this variable is set when the application is first loaded, I bet it wasn't loaded correctly our last deployment and we just need to reload the application.
Senior Dev: We need to output all of the errors and figure out where this error is coming from. Dump out all the errors on everything in production!!
Me: That's dumb... the code works on test... it's not the code.. it's the application.
Senior dev: %$*^$>&÷^> $
Me: Hey I have an idea! If test works... I can go ahead and deploy last week's changes to prod and dump those errors you were talking about!!
Senior Dev: OK
Me: *runs Jenkins job the deploys the new code and restarts the application*
PM: YAY you fixed it!!
Senior Dev: Did you sump put those errors like I said.
Me: Nope didn't touch a thing... I just deployed my irrelevant changes to that error and reloaded the application.2 -
I've now worked on both monolithic solutions and microapps/microservices. I gotta say I'm not sold on the new approach. There's so much overhead! You don't have to know your way around one solution -- no, now you need to know your way around 100 solutions. Debugging? Yeah, good luck with that. You don't have to provision one environment for dev, test, staging, and prod. No, now you need 100 environments per... environment. Now, you need a dedicated fulltime devops person. Now devs can check in breaking changes because their code compiles fine in that one tiny microapp. The extra costs go on and on and on. I get the theoretical benefits but holy crap you pay for it dearly. Going back to monolithic is so satisfying. You just address the bug or new feature head on without the ceremony and complexity. You know you're not crapping on other people's day (compilation-wise) because the entire solution compiles.
...and yeah, I'm getting old. So get off the lawn! ;)2 -
Started a new job as junior developer. One of my first task was to sent a simple notification on an event in out product. Write the code, test that it works, push to devops.
Code compiles, tests pass, it’s deployed to internal test env. Check that my notification works in the test env. No problem.
It’s deployed to the customers test environment. It works and customer accepts it for prod.
We release to prod and of course it fails. Seems to be a simple string.Format that fails for god knows why. After 3h of debugging on prod without success we decide to roll it back.
Today we decided to try it on a backup of the prod db since one of the strings was taken from the db. Still working. No matter what data I input when trying it locally it still wont reproduce the issue we saw on prod.
Fuck this6 -
Tl;Dr Im the one of the few in my area that sees sftping as the prod service account shouldn't be a deployment process. And the ONLY ONE THAT CARES THAT THIS IS GONNA BREAK A BUNCH OF SHIT AT SOME POINT.
The non tl;dr:
For a whole year I've been trying to convince my area that sshing as the production service account is not the proper way to deploy and/or develop batch code. My area (my team and 3 sister teams) have no concept of using version control for our various Unix components (shell scripts and configuration files) that our CRITICAL for our teams ongoing success. Most develop in a "prodqa like" system and the remainder straight in production. Those that develop straight in prodqa have no "test" deployment so when they ssh files straight to actual production. Our area has no concept of continuous integration and automated build checking. There is no "test cases", no "systems testing" or "regression testing". No gate checks for changing production are enforced. There is a standing "approved" deployment process by the enterprise (my company is Whyyyyyyyyyy bigger than my area ) but no one uses it. In fact idk anyone in my area who knows HOW to deploy using the official deployment method. Yes, there is privileged access management on the service account. Yes the managers gets notified everytime someone accesses the privileged production account. The managers don't see fixing this as a priority. In fact I think I've only talk to ONE other person in my area who truly understands how terrible it is that we have full production change access on a daily basis. Ive brought this up so many times and so many times nothing has been done and I've tried to get it changed yet nothing has happened and I'm just SO FUCKING SICK that no one sees how big of a deal this. I mean, overall I live the area I work in, I love the people, yet this one glaring deficiency causes me so much fucking stress cause it's so fucking simple to fix.
We even have an newer enterprise deployment. Method leveraging a product called "urban code deploy" (ucd) to deploy a git repository. JUST FUCKING GIT WITH THE PROGRAM!!!!..... IT WAS RELEASED FUCKING 12 YEARS AGO......
Please..... Please..... I just want my otherwise normally awesome team to understand the importance and benefits of version control and approved/revertable deployments2 -
I'm still studying computer science/programming, I still have one year to do in order to graduate (Master). I am in a work study program so I'm working for a company half of the time, and I'm studying the other half. It is important to mention that I am the only web developer of the company
When I arrived in the company 9 months ago, I was given a Vue project which had been developed by a trainee a few weeks before my arrival and I was asked to correct a few things, it was mostly about css. Then, I was ask to add a few functionalities, nothing really hard to code, and we were supposed to test the solution in a staging environment, and if everything was ok, deploy it to prod.
However, the more I did what I was asked, the more functionalities I had to implement, until I reached a point where I had to modify the API, create new routes, etc. I'm not complaining about that, that's my job and I like it. But the solution was supposed to be ready when I arrived, it was also supposed to be tested and deployed.
The problem is, the person emitting these demands (let's call him guy X) is not from the IT service, it's a future user of the website in the admin side. The demands kept going and going and going because, according to him, the solution was not in a good enough state to be deployed, it missed too many (un)necessary features. It kept going for a few months.
The best is yet to come though : guy X was obviously a superior, and HIS superior started putting pressure on me through mails, saying the app was already supposed to be in production and he was implying that I wasn't working fast enough. Luckily, my IT supervisor was aware of what was going on and knew I obviously wasn't to blame.
In the end, the solution was eagerly deployed in production, didn't go through the staging environment and was opened to the users. Now, guy X receives complaints because none of what I did was tested (it was by me, but I wasn't going to test every single little thing because I didn't have time). Some users couldn't connect or use this or that feature and I am literally drowning in mails, all from guy X, asking me to correct things because users are blocked and it's time consuming for him to do some of the things the website was doing manually.
We are here now just because things have been done in a rush, I'm still working on it and trying to fix prod problems and it's pissing me off because we HAVE a staging environment that was supposed to prevent me from working against the clock.
On a final note, what's funny is that the code I'm modifying, the pre-existing one needs to be refactored because bits and pieces are repeated sometimes 5 times where it should have been externalized and imported from another file. But I don't know when and if I will ever be able to do that.
I could have given more context but it's 4am and I'm kinda tired, sorry if I'm not clear or anything. That's my first rant -
https://devops.com/dear-staging-wer...
What the F#*@&$# %#@$!?!?!
This person has decided to skip using staging, because it doesn't correctly reflect prod!
If that's your problem, than why don't you try to fix it? Create a DB with fake data, make one based on anonymised customer data, or even do it on non-anonymised data (with permission of course), but fix the staging env so that it reflects prod!
This is a devops site (it's literally the name!), and instead of teaching you how to make staging exactly like prod, they tell you to do what caused the creation of the staging->prod system IN THE FIRST PLACE!!!
There's all these stuff like Vagrant that are literally designed to help you as a dev mimic prod, and you just throw it all out!?
"With feature flags, I can safely test in production without fear of breaking something or negatively affecting the customer experience."
Famous last words.12 -
Ever have one of those moments where you're running a service you built to update about a decade worth of police records, realize about halfway through that you fucked the loop and you're copying data from the first record onto every other record, and then just really wish that you had checked things better in test before running this on the prod server?
I'm sure the only reason I'm still here is because the audit log contained the original values and I'm good at pulling data out of it.1 -
I really need to introduce unit tests.
Btw the module is meant for internal use and the readme is more for eventual collaborators than the general public -
Finally made my node production server stable enough that I could focus on writing tests*. I start by setting up docker, mocking cognito, preparing the database and everything. Reading up on Node test suites and following a short tut to set up my first unit test. Didn't go smoothly, but it's local and there are no deadlines so who cares. 4 days later, first assert.equal(1+1, 2) passes and I'm happy.
I start writing all sorts of tests, installing everything required into "devDependancies," and getting the joy of having some tests pass on first try with all asserts set up, feels good!
I decide to make a small update to production, so I add a test, run and see it fail, implement the feature, re-run and, it passes!
I push the feature to develop, test it, and it works as intended. Merge that to master and subsequently to one of my ec2 production servers**, and lo and behold, production server is on a bootloop claiming it "Cannot find module `graphql`". But how? I didn't change any production dependencies, and my package lock json is committed so wth?
I google the issue, but can't find anything relevant. The only thing that I could guess was that some dependencies (including graphql) were referenced*** in both, prod and dev, and were omitted when installed on a prod NODE_ENV, but googling that specific issue yielded no results, and I would have thought npm would be clever enough to see that and would always install those dependencies (spoiler: it didn't for me).
With reduced production capacity (having one server down) I decided to npm uninstall all dev dependencies anyway and see what happens. Aaaaand it works.....
So now I have a working production server, but broken local tests, and I'm not sure why npm is behaving like this...
* Yes I see the irony.
** No staging because $$$, also this is a personal project.
*** I am not directly referencing the same thing twice, it's probably a subdependency somewhere.2 -
I am forced to work with a client's notoriously slow SOAP api. Slow in this case is 1.5-2s per request.
The api is structured rather... creatively... at the same time. So we have to bombard it with thousands of requests to build our data base with historical SOAP data. Also the data sometimes is a couple of hours late, giving a flat line (all values at 0) until retroactively fixing the output for the same requests.
So to fill one dev data base with a year's worth of historical data (nice to have when testing a dashboard application) we hammer the api with ~20k requests (~1 million if we want to be thorough).
Best thing about that: There is no staging/test api and the prod api seems not to handle lots of requests at the same time very well...
Latest thought: Maybe we could put a varnish cache in front of the SOAP for testing. Better have wrong data, than nothing at all and we don't kill the prod clients every time we ramp up a new instance.
Also that would dramatically decrease the 4.2 hours of data pumping to about 7 minutes after the first run. -
Small chaotic startup that never grew up (15 years atm).
Hosts/maintains a number of apps/sites for various customers.
At some point, someone decides that a CMS would be usefull to maintain the content across all products. Forgoing all sense, reason and the very notion of "additional maintenance and dev" it is decided that one should be built in-house.
Fast forward a number of years.
Ops performs routine maintenance on prod-servers. A java-patch accidently knocks out one of the pillars a 3rd party lib the CMS uses for storing images. CMS basically burst in to flames causing a.... significant incident.
Enter yours truly to fix the mess.
Spend a few days replacing the affected 3rd party lib. Run tests on CMS in test and staging environments. Apply java-patch. All seems fine.
When speaking to frontenders and app-devs, a significant hurdle present itself:
All test/staging instances of all websites/apps/etc ALL USE PRODUCTION CMS. Hardcoded. No way around.
There is -no- way to properly test and verify the functionality of any changes made to the home-brewed CMS.
My patch did indeed work in the end.
But did the company learn anything? Did they listen to my reasoning, pleading or even anguished screams for sanity?
No.6 -
OMG you fucking dick! All I asked if you minded if I created a PR to push your branch to prod because test was down.
You didn’t have to bother the fucking software architects in a group chat with me saying I needed fucking help.
If you wrote the isset function correctly to begin with or even put the JSON file in the correct repo like every other file on the site this would not be a problem.
Fuck contractors! Fuck Assholes!2 -
Story time;
Major project, multi million budget, huge business and IT coordination, board level status updates, meeting started back in March 2018 for a Go Live of Aug 2019.
Based on draft requirements (and experience) I request the test environment be built for half of the work. Turns out that no one told Server Eng and they are out of space in both dev and prod until Q2 of 2019. We went from Green to Red because a Service Request.5 -
Yesterday whole 12 hours we were working on deployment about a feature X that has deadline yesterday itself.
Everything damn perfectly running on Test env but not on Prod.
We made Prod into Dev/Test/Fucking garabage env. Haha.
I was laughing to myself at same time crying hard in my deep heart.
Business guys chasing PM
PM chasing us
And from morning till night we were in same room. Had lunch, and dinner only went out for toilet and to refil water bottles.
And found that feature Y is not working at same time that is related to our feature X. Fucking we have been wasted hours on it.
One of my devs got so fucked up emotionally that he messed up the code (not his fault) he didnt had his lunch and dinner. Had to console him later that its not his fault. Poor guy not sure whether he slept or not; will find out in few hours.
Anyways reported a bug.
But that bug assigned to us for fixing.
Are you fucking kidding me.
Anyways no choice. Had to do it.
Hope today everything goes good or horribly bad. FYI no deployments on Friday damn we are in stalememt till Monday.
Fuck that bug
Or
May be fuck our stupiditiy while makiing mistakes.1 -
While addressing a Senior Dev's (SD) query from another team.
SD: why is this field mandatory? Can't it be just optional? Any other work around?
Me: Is your code changes already pushed in Devo? In that case, we provide a value which will work since you are not concerned about it.
SD: Yes. It's pushed till production. And, I want to test changes in Prod.
Me: (shared some codes) and explained that this feature for testing is only available in Devo.
SD: I know that. (Shared me a ticket) I want this field to be optional. That's it.
Me: (read the entire ticket. Didn't find anything related their) Told him, I will discuss with team. And meanwhile, for Devo, you can use this value.
Next morning, I accidentally came over some other ticket raised by him only which had the correct doubts regarding request to support this field in production
Now, I don't know why did he share a wrong ticket with me.
And, how will it even help him if that field was even optional.
THAT JUST WONT WORK IN PRODUCTION.
I will discuss with my team and see what can happen. -
Its time to remind ourselves the wonderful time of the year, where people look back and decide to be better and learn from your mistakes...
For example never test in prod and wake me up with notifications you dumbfucks... Someone has exams to do..
Seriously, I have seen a lot of push notifications lately, its some kind of Letstestinprod-uary?1 -
is it necessary to have cherry picking a part of git branching/release process?
we have 3 branches : develop, release and master.
currently every dev works on feature as follows : they make a branch out of develop, write code, raise pr against develop, get it reviewed and merge back to develop. later the release feature list is generated, and we cherry pick all the release related commits to release branch, and make a prod build out of release branch. finally, the code is moved to master and rags are generated accordingly.
so the major issue with this process is feature blocking. as of now, i have identified 4 scenarios where a feature should not be released :
1. parallel team blocker : say i created a feature x for android that is supposed to go in release 1.2.1 . i got it merged to develop and it will be cherry picked to release on relase day. but on release day it is observed that feature x was not completed by the ios dev and therefore we cannot ship it for android alone.
2. backend blocker : same as above scenario, but instead of ios, this time its the backend which hasn't beem created for the feature x
3. qa blocker : when we create a feature and merge it to develop, we keep on giving builds from develop branch adter every few days. however it could be possible that qa are not able to test it all and on release day, will declare thaf these features cannot be tested and should not be moved to release
4. pm blocker: basically a pm will add all the tickets for sprint in the jira board. but which tickets should be released are decided at the very late days of sprint. so a lot of tasks get merged to develop which are not supposed to go.
so there's the problem. cherry picking is being a major part of release process and i am not liking it. we do squash and merges, so cherry picking is relatively easy, but it still feels a lot riskier.
for 1 and 2 , we sometimes do mute releases : put code in release but comment out all the activation code blocks . but if something is not qa tested or rejected by pm, we can't do a mute release.
what do you folks suggest?9 -
Long post, TLDR: Given a large team building large enterprise apps with many parts (mini-projects/processes), how do you reduce the bus-factor and the # of Brent's (Phoenix Project)?
# The detailed version #
We have a lot of people making changes, building in new processes to support new flows or changes in the requirements and data.
But we also have to support these except when it gets into Production there is little information to quickly understand:
- how it works
- what it does/supposed to do
- what the inputs and dependencies are
So often times, if there's an issue, I have to reverse engineer whatever logic I can find out of a huge mess.
I guess the saying goes: the only people that know how it works is whoever wrote it and God.
I'm a senior dev but i spend a lot of time digging thru source code and PROD issues to figure out why ... is broken and how to maybe fix it.
I think in Agile there's supposed to be artifacts during development but never seen em.
Personally whenever i work on a new project, I write down notes and create design diagrams so i can confirm things and have easy to use references while working.
I don't think anyone else does that. And afterwards, I don't have anywhere to put it/share it. There is no central repo for this stuff other than our Wiki but for the most part, is like a dumping ground. You have to dig for information and hoping there's something useful.
And when people leave, information is lost forever and well... we hire a lot of monkeys... so again I feel a lot of times i m trying to recover information from a corrupted hard drive...
The only way real information is transferred is thru word of mouth, special knowledge transfer sessions.
Ideally I would like anything that goes into PROD to have design docs as well as usage instructions in order for anyone to be able to quickly pick it up as needed but I'm not sure if that's realistic.
Even unit tests don't seem to help much as they just test specific functions but don't give much detail about how a whole process is supposed to work.9 -
Dumb mistake from when I was still working:
My work laptop’s SSD went haywire, and I/O would spike every 10 minutes or so for ~50 ms. The hardware guy said he could replace the SSD right away, or I could endure it for a few weeks and get a new laptop instead. Obviously, I agreed to wait. The stutter noticeably affected screen rendering, but I didn’t notice any other issues. Little did I know that every time it happened, all input was ignored (as in: not queued). Normally it wouldn’t matter, because hitting a random ~50 ms window is hard. How-the-f×ck-ever…
A few days later — without getting into “why” — I was forced to apply a patch in production. So I opened an SSH session to prod in one terminal, spun up a dev environment in another, copied the database schema from prod to dev, and made sure to test everything. No issues, so I jumped to prod, applied the patch, restarted services, jumped back to dev, and cleaned up the now-unnecessary database. Only to discover that my “jumped back to dev” keystroke didn’t register.16 -
This place I’m consulting at just had a new Directory of Change, first policy she made for IT is mandatory 3 working days wait for any release in Test and 2 weeks for Prod. That includes application config value update in Test database. We think she might have misunderstood her job title Ms. Director of No Change..2
-
This is a continuation of my previous rant about admob being not very informative when it comes to invalid traffic and the resulting restriction in ad delivery.
I then wanted to use admob mediation to hang in facebook ads. My app is written with Xamarin.Forms.
So first I needed to make some facebook configuration - create an account, let my app review, create some ad placements and other shit. I came to the point where I had to put in a link to my privacy policy and the link could not be accepted due to some SSL fuckup -.-'
I then found out that there is an issue with my SSL Chain. With the help of whatsmychaincert.com I solved that issue. Little side note here: I have limited knowledge of that stuff and my cousin helped me set up my homepage so I had no idea what I was doing. Did a snapshot and luckily I did not needed that as everything worked :)
This took me around half an hour just so I can paste the fucking link to activate my app in facebook developer portal.
After that I made the whole mediation configuration shit - not an issue as google documented this quite well but it took some time.
Now comes the shitty part. To use admob mediation you need adapters to the other ad network. I found a nuget package with exactly what I needed just to find out that it is outdated. So I pulled the repo and saw that this thing is an aar binding library. Never did that stuff so I read some docs again. Updated the package and consumed it in my app.
The google docs then said "Use this mediation test shit to check if you did everything correct before going prod" - aar binding nr. 2 (but I am now familiar with that :P). This thing then told me that facebook ads could not be loaded because the SDK version is outdated -.-' SDK version comes from another nuget package which is referenced by the first aar thingie. I tracked that thing back to a repo where I found out that they are indeed totally behind. So I downloaded the aar, made a binding lib and bound that to my first aar binding lib as that depends on this.
Put that all back in my app - tested mediation and fucking finally after 6 hours everything comes together! all lights are green and things work.
Sorry if this is not quite a rant but it was quite a journey and I just had to share it. -
Should I be even be testing if no one cares.
I keep asking Devs regarding the functionalities of their system for testing and then, realize it's already on Prod even before I test in beta or enable to test in beta.
Do I exist!!!
Now, for the nth time, I have started to test when things are almost there or already there in Prod.
I should keep myself in adrenaline mode always on from now on.
Need to do something worthy.
The worst way to start a new job.2 -
Tfw you find a comment saying something needs to be tested when the system is in prod.
Narrator: they did in fact not test it when the system was in production -
This is an actual transcript...
Since it's way too long for the normal 5000 characters, hence splitting it up...
Infra Guy: mr Dev, could you please give some rational for update of jjb?
Dev: sparse checkout support is missing
Infra Guy: is this support mandatory to achive whatever you trying to do?
Dev: yes
Infra Guy: u trying to get set of specific folder for set of specific components?
Dev: yes
Infra Guy: bash script with cp or mv will not work for you?
Dev: no
Infra Guy: ?
Dev: when you have already present functionality why reinvent the wheel
Dev: jenkins has support for it
Dev: the jjb is the bottle neck
Infra Guy: getting this functionality onto our infra would have some implications
Dev: why should I write bash script if jenkins allows me to do that
Dev: what implications ??
Infra Guy: will you commit to solve all the issues caused by new jjb?
Dev: you show me the implications first
Infra Guy: like a year ago i have tried to get new jjb <commit_url>
Infra Guy: no, the implications is a grey area
Infra Guy: i cant show all of them and they may hit like in week or eve month
Dev: then why was it not tackled
Dev: and why was it kept like that
Infra Guy: few jobs got broken on something
Dev: it will crop up some time later
Dev: if jobs get broken because of syntax
Dev: then jobs can be fixed
Dev: is it not ???
Infra Guy: ofc
Infra Guy: its just a question who will fix them
Dev: follow the syntax and follow the guidelines
Dev: put up a test server and try and lets see
Dev: you have a dev server
Dev: why not try on that one and see what all jobs fails
Dev: and why they fail
Dev: rather than saying it will fail and who will fix
Dev: let them fail and then lets find why
Dev: I manually define a job
Dev: I get it done
Infra Guy: i dont think we have test server which have the same workload and same attention as our prod
Dev: unless you test how would you know ??
Dev: and just saying that it broke one with a version hence I wont do it
Infra Guy: and im not sure if thats fair for us to deal with implication of upgrading of the major components just cause bash script is not good enough for u
Dev: its pretty bad
Infra Guy: i do agree
Infra TL Guy: Dev, what Infra Guy is saying is that its not possible to upgrade without downtime
Infra Guy: no
Dev: how long a downtime are we looking at ??
Infra Guy: im saying that after this upgrade we will have deal with consequences for long time
Infra Guy-2: No this is not testing the upgrade is the huge effort as we dont have dev resources to handle each job to run
Dev: if your jjb compiles all the yaml without error
Dev: I am not sure what consequences are we talking of
Infra Guy: so you think there will be no consequences, right?
Dev: unless you take the plunge will you know ??
Dev: you have a dev server running at port 9000
Infra Guy: this servers runs nothing
Dev: that is good
Dev: there you can take the risk
Infra Guy: and the fack we have managed to put something onto api doesnt mean it works
Dev: what API ?
Infra Guy: jenkins api
Infra Guy: hmmm
Dev: what have you put on Jenkins API ??
Infra Guy: (
Dev: jjb is a CLI
Infra Guy: ((
Dev: is what I understand
Dev: not a Jenkins API
Infra Guy: (((
Dev: (((((
Infra Guy: jjb build xmls and push them onto api
Infra Guy: and its doent matter
Dev: so you mean to say upgrading a CLI is goig to upgrade your core jenkisn API
Dev: give me a break
Infra Guy: the matter is that even if have managed to build something and put it onto api
Infra Guy: doesnt mean it will work
Dev: the API consumes the xml file and creates a job
Infra Guy: right
Dev: if it confirms to the options which it understands
Dev: then everything will work
Dev: I am actually not getting your point Infra Guy
Infra Guy: i do agree mr Dev
Dev: we are beating around the bush
Infra Guy: just want to be sure that if this upgrade will break something
Infra Guy: we will have a person who will fix it
Dev: that is what CICD is supposed to let me know with valid reasons
Dev: why can't that upgrade be done
Infra Guy: it can be done
Infra Guy: i even have commit in place3 -
A while back I was looking for a new job and was given an interview by one company who shall remain nameless. Before the interview, they asked me look through their current site, nothing unusual there, so I started browsing. Then I received an email with all the details I needed to access their production server. Apparently they wanted me to look through the code, unusual but I did so.
First thing all the passwords, including those belonging to members of the public were stored in plain text and many were still the default passwords which were based on the Id so were sequential.
I highlighted these issues at the interview and they then asked me to do a test, not the usual test though, they asked me to add some charts to their prod site. Needless to say that didn’t happen and I got another job elsewhere.1 -
- Change this, change that too, oh and that too.
* Ok, will do ( however unlikely it's going to be finished correctly, as you didn't consult me before or listen to me about the impact this might have )
- ( just stfu and do it )
Sometime at or after important cutoff:
- Hey this doesn't show up, is that right
* Yeah, you wanted too many things changed at once + fix it and notice a stupid error like an && switched with a ||, or other models that don't know about change x yet
/repeat3 -
Need a workaround to start the application. Need to activate/deactivate a few workarounds at different points in the flow to test the code in development. Need a different workarounds to test the code in QA. Maybe another workaround for prod? Need a workaround to check in the code to CI/CD. Need a workaround to deploy the code.3
-
Why do some employers make such a distinction between learning the tools at university and learning the same tools at the workplace?
Are they backward or old? Don't they know modern, high-quality universities have modern environments that are in fact real life?
Environments with acc-test-prod-dev with gitlab, ci/cd in Scrum teams and the works? Heck, at my uni we even worked at real companies, did internships there for months!
Come on.. to me this 'the tools you learned in school isn't the same experience as real life experience'. Right, these guys must be on some conservative backward model because there is in fact no difference.
I have worked both during my uni internship at a real company (in teams too) as well as irl at real companies and there is no difference, it's the same thing.
I don't care if I've learned to experience git + ReactJS etc during an internship through uni or at a workplace. It's all bureaucracy.10 -
You know what grinds my gears?
When someone complains about an issue, then bypasses it and ruins your ability to test. Sure, you can recreate it to the best of your ability, but if the issue is very specific, you're screwed (browser, data in prod, etc). -
Whats the fucking purpose of our companys dev test and prod env. Dev always only has a single instance. Sometimes clustered services run as cluster on test. Producing headaches because the clustering behaviour couldnt be seen on a single instance and Prod lacks all the nice deployment tools off dev/test. Fuck thinking you could dev then test and prod without any major reconfiguration and headaches. And all because the Storage costs is RETARDEDLY expensive because the backup EVERYTHING with ridiculess overkill. That results in headaches when requesting new servers. Took an old Workstation from the shelves and made it my vm slave so at least i could reliably deploy to test.. Fuck this process
-
Context: I am leaving my company to work at a data science lab in another one.
My senior dev (with PO hat): we need to gather data from prod to check test coverage. You will like it as you will be data scientist hehehe (actually not funny). You will have to analyze the features, and find relations between them to be able to compare with the existing tests
Me: oh cool, we can use ML to do that!
Him: Nope, we need to di it in the next 3 weeks so we need to do it manually.
Me:... I have quit for something.... -
I have been working in the same PR for 2 months. Many metamorphosis of it i would say 😂 I even made it all the way to prod and had to be reverted coz it was causing straight chaos. I have worked on some pretty complex features in the past year, this the smallest yet most complicated ask i have ever received. Orleans really put my patience to test and here i am, right now completely unable to write any code. I am just here planning my vacation in 4 weeks time.1
-
One nightmarish project that was doomed from the beginning, had me as the sole developer. I could hardly sleep when we began testing on a separate test system, but with (nearly) all the config stored in shared memory and copied from the production system, I dreaded, half awake, that the production server data base connection was still configured in the test system and that it was shooting all it's test data repeatedly to prod.
Finally drove to company in middle of the night at 4 o'clock. Checked everything was OK, tried to sleep 3 hours before the start of the work day.
This system also had the most hideous memory corruption in some shared memory that was used across several processes and should have been thoroughly protected by a mutex, but somehow, sometimes this crucial map, that was used to speed up the access to all the customer data just contained garbage.
Still haunts me to that day. (Like xkcd's unresolved tension of a non-matching parenthesis - an unresolved bug. -
In my initial days as a web developer, i was assigned a task, to implement a cart share functionality in an e commerce company.
I made the functionality and tested on my system.
Result: working good.
Pushed it to beta testing environment.
Resilt: working good.
Pushed to pre production environment.
Result: working good.
Pushed to live site.
Result: 😀 Error in live site..
So a call comes to me from my team lead..
Asks what was the issue...
Me: i dont know either.
....
After 3-4 hrs:
I found the reason.
My system, beta test env, pre prod env are all having latest php version (5.6 i guess)
But the live server had old version of php.
Me: laughed like anything.
I didn't know that these things would matter in such a great level.
Moral of the story:
Be one with the force (server in this case)2 -
What are opinions out there on security theatre?
Should developers have access to aws secrets?
Should dev test and prod be on separate vpcs or all in one vpc.
I have worked at banks where this was strictly not allowed.
Can’t wait to hear responses on this one….11