Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "high availability"
-
Worst dev team failure I've experienced?
One of several.
Around 2012, a team of devs were tasked to convert a ASPX service to WCF that had one responsibility, returning product data (description, price, availability, etc...simple stuff)
No complex searching, just pass the ID, you get the response.
I was the original developer of the ASPX service, which API was an XML request and returned an XML response. The 'powers-that-be' decided anything XML was evil and had to be purged from the planet. If this thought bubble popped up over your head "Wait a sec...doesn't WCF transmit everything via SOAP, which is XML?", yes, but in their minds SOAP wasn't XML. That's not the worst WTF of this story.
The team, 3 developers, 2 DBAs, network administrators, several web developers, worked on the conversion for about 9 months using the Waterfall method (3~5 months was mostly in meetings and very basic prototyping) and using a test-first approach (their own flavor of TDD). The 'go live' day was to occur at 3:00AM and mandatory that nearly the entire department be on-sight (including the department VP) and available to help troubleshoot any system issues.
3:00AM - Teams start their deployments
3:05AM - Thousands and thousands of errors from all kinds of sources (web exceptions, database exceptions, server exceptions, etc), site goes down, teams roll everything back.
3:30AM - The primary developer remembered he made a last minute change to a stored procedure parameter that hadn't been pushed to production, which caused a side-affect across several layers of their stack.
4:00AM - The developer found his bug, but the manager decided it would be better if everyone went home and get a fresh look at the problem at 8:00AM (yes, he expected everyone to be back in the office at 8:00AM).
About a month later, the team scheduled another 3:00AM deployment (VP was present again), confident that introducing mocking into their testing pipeline would fix any database related errors.
3:00AM - Team starts their deployments.
3:30AM - No major errors, things seem to be going well. High fives, cheers..manager tells everyone to head home.
3:35AM - Site crashes, like white page, no response from the servers kind of crash. Resetting IIS on the servers works, but only for around 10 minutes or so.
4:00AM - Team rolls back, manager is clearly pissed at this point, "Nobody is going fucking home until we figure this out!!"
6:00AM - Diagnostics found the WCF client was causing the server to run out of resources, with a mix of clogging up server bandwidth, and a sprinkle of N+1 scaling problem. Manager lets everyone go home, but be back in the office at 8:00AM to develop a plan so this *never* happens again.
About 2 months later, a 'real' development+integration environment (previously, any+all integration tests were on the developer's machine) and the team scheduled a 6:00AM deployment, but at a much, much smaller scale with just the 3 development team members.
Why? Because the manager 'froze' changes to the ASPX service, the web team still needed various enhancements, so they bypassed the service (not using the ASPX service at all) and wrote their own SQL scripts that hit the database directly and utilized AppFabric/Velocity caching to allow the site to scale. There were only a couple client application using the ASPX service that needed to be converted, so deploying at 6:00AM gave everyone a couple of hours before users got into the office. Service deployed, worked like a champ.
A week later the VP schedules a celebration for the successful migration to WCF. Pizza, cake, the works. The 3 team members received awards (and a envelope, which probably equaled some $$$) and the entire team received a custom Benchmade pocket knife to remember this project's success. Myself and several others just stared at each other, not knowing what to say.
Later, my manager pulls several of us into a conference room
Me: "What the hell? This is one of the biggest failures I've been apart of. We got rewarded for thousands and thousands of dollars of wasted time."
<others expressed the same and expletive sediments>
Mgr: "I know..I know...but that's the story we have to stick with. If the company realizes what a fucking mess this is, we could all be fired."
Me: "What?!! All of us?!"
Mgr: "Well, shit rolls downhill. Dept-Mgr-John is ready to fire anyone he felt could make him look bad, which is why I pulled you guys in here. The other sheep out there will go along with anything he says and more than happy to throw you under the bus. Keep your head down until this blows over. Say nothing."11 -
--- GitHub 24-hour outage post mortem ---
As many of you will remember; Github fell over earlier this month and cracked its head on the counter top on the way down. For more or less a full 24 hours the repo-wrangling behemoth had inconsistent data being presented to users, slow response times and failing requests during common user actions such as reporting issues and questioning your career choice in code reviews.
It's been revealed in a post-mortem of the incident (link at the end of the article) that DB replication was the root cause of the chaos after a failing 100G network link was being replaced during routine maintenance. I don't pretend to be a rockstar-ninja-wizard DBA but after speaking with colleagues who went a shade whiter when the term "replication" was used - It's hard to predict where a design decision will bite back and leave you untanging the web of lies and misinformation reported by the databases for weeks if not months after everything's gone a tad sideways.
When the link was yanked out of the east coast DC undergoing maintenance - Github's "Orchestrator" software did exactly what it was meant to do; It hit the "ohshi" button and failed over to another DC that wasn't reporting any issues. The hitch in the master plan was that when connectivity came back up at the east coast DC, Orchestrator was unable to (un)fail-over back to the east coast DC due to each cluster containing data the other didn't have.
At this point it's reasonable to assume that pants were turning funny colours - Monitoring systems across the board started squealing, firing off messages to engineers demanding they rouse from the land of nod and snap back to reality, that was a bit more "on-fire" than usual. A quick call to Orchestrator's API returned a result set that only contained database servers from the west coast - none of the east coast servers had responded.
Come 11pm UTC (about 10 minutes after the initial pant re-colouring) engineers realised they were well and truly backed into a corner, the site was flipped into "Yellow" status and internal mechanisms for deployments were locked out. 5 minutes later an Incident Co-ordinator was dragged from their lair by the status change and almost immediately flipped the site into "Red" status, a move i can only hope was accompanied by all the lights going red and klaxons sounding.
Even more engineers were roused from their slumber to help with the recovery effort, By this point hair was turning grey in real time - The fail-over DB cluster had been processing user data for nearly 40 minutes, every second that passed made the inevitable untangling process exponentially more difficult. Not long after this Github made the call to pause webhooks and Github Pages builds in an attempt to prevent further data loss, causing disruption to those of us using Github as a way of kicking off our deployment processes (myself included, I had to SSH in and run a git pull myself like some kind of savage).
Glossing over several more "And then things were still broken" sections of the post mortem; Clever engineers with their heads screwed on the right way successfully executed what i can only imagine was a large, complex and risky plan to untangle the mess and restore functionality. Github was picked up off the kitchen floor and promptly placed in a comfy chair with a sweet tea to recover. The enormous backlog of webhooks and Pages builds was caught up with and everything was more or less back to normal.
It goes to show that even the best laid plan rarely survives first contact with the enemy, In this case a failing 100G network link somewhere inside an east coast data center.
Link to the post mortem: https://blog.github.com/2018-10-30-...6 -
My dream is to build a shopping cart for web stores that doesn't fucking suck.
Seriously Bigcommerce, Shopify, Magneto, etc. All of you can eat bag of dicks and burn in hell for ever.
I don't care what languages you fancy, all of their stacks are a pile of shit, monkey patched together with popsicle sticks and duct tape and it all falls apart with high concurrency.
All their greasy haired sales teams will throw all manners of horse shit at the poor bastards who are trying to run a business so they can pad their commission checks... "High availability", "scalable", "reliable", "Increased conversation rate"... Lying dick fucks, all of them! I am calling them the fuck out on that snake oil they're all peddling.
The only thing worse than their shit APIs is the shit documentation and the shit support that accompanies them.
Support of these platforms are pretty much all the same, sure mayhaps one has 24*7 phone support and another closes at 9 or some shit like that, either way the only people they put on the phone are monkeys that will freeze up and say "I'm not a developer so I can't help you"... Guess what, "Eric"! I didn't ask if you're a fucking dev! I'm calling because one of your devs fucked up and I need you to tell him to unfuck it so I can get the fuck on with my day!
Their app/plugin market places are shameful to say the least. The overall quality of software is somewhat dire and it's mostly dominated by oversees developers who speak English about as well as the language they're developing with (not very well usually).
I could go on until I hit the character limit but I'm gonna end it here by saying, all shopping carts suck and they should burn for eternity in the depths of hell so that a savior can free all developers from this agonizing torment.9 -
Most satisfying bug I've fixed?
Fixed a n+1 issue with a web service retrieving price information. I initially wrote the service, but it was taken over by a couple of 'world class' monday-morning-quarterbacks.
The "Worst code I've ever seen" ... "I can't believe this crap compiles" types that never met anyone else's code that was any good.
After a few months (yes months) and heavy refactoring, the service still returned price information for a product. Pass the service a list of product numbers, service returns the price, availability, etc, that was it.
After a very proud and boisterous deployment, over the next couple of days the service seemed to get slower and slower. DBAs started to complain that the service was causing unusually high wait times, locks, and CPU spikes causing problems for other applications. The usual finger pointing began which ended up with "If PaperTrail had written the service 'correctly' the first time, we wouldn't be in this mess."
Only mattered that I initially wrote the service and no one seemed to care about the two geniuses that took months changing the code.
The dev manager was able to justify a complete re-write of the service using 'proper development methodologies' including budgeting devs, DBAs, server resources, etc..etc. with a projected year+ completion date.
My 'BS Meter' goes off, so I open up the code, maybe 5 minutes...tada...found it. The corresponding stored procedure accepts a list of product numbers and a price type (1=Retail, 2=Dealer, and so on). If you pass 0, the stored procedure returns all the prices.
Code basically looked like this..
public List<Prices> GetPrices(List<Product> products, int priceTypeId)
{
foreach (var item in products)
{
List<int> productIdsParameter = new List<int>();
productIdsParameter.Add(item.ProductID);
List<Price> prices = dataProvider.GetPrices(productIdsParameter, 0);
foreach (var price in prices)
{
if (price.PriceTypeID == priceTypeId)
{
prices = dataProvider.GetPrices(productIdsParameter, price.PriceTypeID);
return prices;
}
* Omitting the other 'WTF?' code to handle the zero price type
}
}
}
I removed the double stored procedure call, updated the method signature to only accept the list of product numbers (which it was before the 'major refactor'), deployed the service to dev (the issue was reproducible in our dev environment) and had the DBA monitor.
The two devs and the manager are grumbling and mocking the changes (they never looked, they assumed I wrote some threading monstrosity) then the DBA walks up..
DBA: "We're good. You hit the database pretty hard and the CPU never moved. Execution plans, locks, all good to go."
<dba starts to walk away>
DevMgr: "No fucking way! Putting that code in a thread wouldn't have fix it"
Me: "Um, I didn't use threads"
Dev1: "You had to. There was no way you made that code run faster without threads"
Dev2: "It runs fine in dev, but there is no way that level of threading will work in production with thousands of requests. I've got unit tests that prove our design is perfect."
Me: "I looked at what the code was doing and removed what it shouldn't be doing. That's it."
DBA: "If the database is happy with the changes, I'm happy. Good job. Get that service deployed tomorrow and lets move on"
Me: "You'll remove the recommendation for a complete re-write of the service?"
DevMgr: "Hell no! The re-write moves forward. This, whatever you did, changes nothing."
DBA: "Hell yes it does!! I've got too much on my plate already to play babysitter with you assholes. I'm done and no one on my team will waste any more time on this. Am I clear?"
Seeing the dev manager face turn red and the other two devs look completely dumbfounded was the most satisfying bug I've fixed.5 -
Titled my presentation "High Availability Setup", after a moment of thought, I changed it to "High Availability Architecture".
There, I will sound a bit more intelligent when I read it out loud on Monday. 😎😂2 -
So, I am currently seeking opportunities at other companies. I randomly got a call on Thursday around 12pm from an unknown number and I was not able to take that call as I was in a meeting. Later on, after looking up that number on Truecaller, I found out that it’s a recruiter from a US-based firm that I had applied to earlier. I immediately tried to call that person again but she was not able to talk as she was in another meeting. I tried texting the recruiter asking for her availability but she didn’t respond. I called again and this time she got annoyed at me, saying that she will call me back if needed. Now, on the weekend I again tried to message her, asking when she is free for a conversation, she is acting high and mighty, saying that she will call me when we (the company) have interviews again (hinting that I have missed the opportunity and it’s my fault). Her passive-aggressive attitude seems to be coming because I didn’t take her earlier call— I did not deliberately avoid her call, I was in another meeting. I was not given any intimation that she is going to call me— let alone on a weekday at 12pm. My current company expects a high-level of professionalism and I intend to show the same level of professionalism in any future companies that I work with. This kind of dehumanization (mainly due to a power imbalance in top-down heirarchical structure) is why big companies have hard time retaining workers these days. And this company was not Google/Apple or anything remotely in the same league. So I seemed to have dodged a bullet there.4
-
Just now I was reading on https://pve.proxmox.com/wiki/... about high availability. Now my Proxmox VE is just a tower (which happens to have ECC memory) that's stored in my storage room (and which is mostly used for experimental and home server purposes). But my mail servers.. those have been made with high availability in mind. Most importantly, I've made their services entirely redundant (but within the same datacenter). And when they have updates, I apply updates to one, reboot, see if it didn't break something and then do the same to the other server after the first one came up again. So no downtime whatsoever.
If memory serves me right, I think that I've been able to maintain these servers for the last year without any downtime at all (I reboot them every month to apply new kernels but they haven't both been simultaneously down at any moment). Does that make them High Availability? My interventions regarding their availability have been rather trivial. Is it really that hard..?4 -
Man....I keep up with this strange love hate relationship I have with Python....
Last night it was python that literally wrote my homework: define all possible equivalent partition tables with cause and effect analysis and boundary value checks for a program. The whole thing wrote itself and all I had to do was verify the inputs. Something that I was able to do using jupyter with pandas and numpy. On one hand, I despise the lack of static typing and use of whitespace as a block delimiter. On the other I cannot but help feeling a high level of gratitude over the language and its high availability and ease of use for this.
Sure, I could have used other tools, but this language has dominated hardcore in this regard enough to the point of not considering it being a crime against humanity.3 -
I've just noticed something when reading the EU copyright reform. It actually all sounds pretty reasonable. Now, hear me out, I swear that this will make sense in the end.
Article 17p4 states the following:
If no authorisation [by rightholders] is granted, online content-sharing service providers shall be liable for unauthorised acts of communication to the public, including making available to the public, of copyright-protected works and other subject matter, unless the service providers demonstrate that they have:
(a) made best efforts to obtain an authorisation, and
(b) made, in accordance with high industry standards of professional diligence, best efforts to ensure the unavailability of specific works and other subject matter for which the rightholders have provided the service providers with the relevant and necessary information; and in any event
(c) acted expeditiously, upon receiving a sufficiently substantiated notice from the rightholders, to disable access to, or to remove from, their websites the
notified works or other subject matter, and made best efforts to prevent their future uploads in accordance with point (b).
Article 17p5 states the following:
In determining whether the service provider has complied with its obligations under paragraph 4, and in light of the principle of proportionality, the following elements, among others, shall be taken into account:
(a) the type, the audience and the size of the service and the type of works or other subject matter uploaded by the users of the service; and
(b) the availability of suitable and effective means and their cost for service providers.
That actually does leave a lot of room for interpretation, and not on the lawmakers' part.. rather, on the implementer's part. Say for example devRant, there's no way in hell that dfox and trogus are going to want to be tasked with upload filters. But they don't have to.
See, the law takes into account due diligence (i.e. they must give a damn), industry standards (so.. don't half-ass it), and cost considerations (so no need to spend a fortune on it). Additionally, asking for permission doesn't need to be much more than coming to an agreement with the rightsholder when they make a claim to their content. It's pretty common on YouTube mixes already, often in the description there's a disclaimer stating something like "I don't own this content. If you want part of it to be removed, get in touch at $email." Which actually seems to work really well.
So say for example, I've had this issue with someone here on devRant who copypasted a work of mine into the cancer pit called joke/meme. I mentioned it to dfox, didn't get removed. So what this law essentially states is that when I made a notice of "this here is my content, I'd like you to remove this", they're obligated to remove it. And due diligence to keep it unavailable.. maybe make a hash of it or whatever to compare against.
It also mentions that there needs to be a source to compare against, which invalidates e.g. GitHub's iBoot argument (there's no source to compare against!). If there's no source to compare against, there's no issue. That includes my work as freebooted by that devRant user. I can't prove my ownership due to me removing the original I posted on Facebook as part of a yearly cleanup.
But yeah.. content providers are responsible as they should be, it's been a huge issue on the likes of Facebook, and really needs to be fixed. Is this a doomsday scenario? After reading the law paper, honestly I don't think it is.
Have a read, I highly recommend it.
http://europarl.europa.eu/doceo/...13 -
During one of our visits at Konza City, Machakos county in Kenya, my team and I encountered a big problem accessing to viable water. Most times we enquired for water, we were handed a bottle of bought water. This for a day or few days would be affordable for some, but for a lifetime of a middle income person, it will be way too much expensive. Of ten people we encountered 8 complained of a proper mechanism to access to viable water. This to us was a very demanding problem, that needed to be sorted out immediately. Majority of the people were unable to conduct income generating activities such as farming because of the nature of the kind of water and its scarcity as well.
Such a scenario demands for an immediate way to solve this problem. Various ways have been put into practice to ensure sustainability of water conservation and management. However most of them have been futile on the aspect of sustainability. As part of our research we also considered to check out of the formal mechanisms put in place to ensure proper acquisition of water, and one of them we saw was tree planting, which was not sustainable at all, also some few piped water was being transported very long distances from the destinations, this however did not solve the immediate needs of the people.We found out that the area has a large body mass of salty water which was not viable for them to conduct any constructive activity. This was hint enough to help us find a way to curb this demanding challenge. Presence of salty water was the first step of our solution.
SOLUTION
We came up with an IOT based system to help curb this problem. Our system entails purification of the salty water through electrolysis, the device is places at an area where the body mass of water is located, it drills for a suitable depth and allow the salty water to flow into it. Various sets of tanks and valves are situated next to it, these tanks acts as to contain the salty water temporarily. A high power source is then connected to each tank, this enable the separation of Chlorine ions from Hydrogen Ions by electrolysis through electrolysis, salt is then separated and allowed to flow from the lower chamber of the tanks, allowing clean water to from to the preceding tanks, the preceding tanks contains various chemicals to remove any remaining impurities. The whole entire process is managed by the action of sensors. Water alkalinity, turbidity and ph are monitored and relayed onto a mobile phone, this then follows a predictive analysis of the data history stored then makes up a decision to increase flow of water in the valves or to decrease its flow. This being a hot prone area, we opted to maximize harnessing of power through solar power, this power availability is almost perfect to provide us with at least 440V constant supply to facilitate faster electrolysis of the salty water.
Being a drought prone area, it was key that the outlet water should be cold and comfortable for consumers to use, so we also coupled our output chamber with cooling tanks, these tanks are managed via our mobile application, the information relayed from it in terms of temperature and humidity are sent to it. This information is key in helping us produce water at optimum states, enabling us to fully manage supply and input of the water from the water bodies.
By the use of natural language processing, we are able to automatically control flow and feeing of the valves to and fro using Voice, one could say “The output water is too hot”, and the system would respond by increasing the speed of the fans and making the tanks provide very cold water. Additional to this system, we have prepared short video tutorials and documents enlighting people on how to conserve water and maintain the optimum state of the green economy.
IBM/OPEN SOURCE TECHNOLOGIES
For a start, we have implemented our project using esp8266 microcontrollers, sensors, transducers and low payload containers to demonstrate our project. Previously we have used Google’s firebase cloud platform to ensure realtimeness of data to-and-fro relay to the mobile. This has proven workable for most cases, whether on a small scale or large scale, however we meet challenges such as change in the fingerprint keys that renders our device not workable, we intend to overcome this problem by moving to IBM bluemix platform.
We use C++ Programming language for our microcontrollers and sensor communication, in some cases we use Python programming language to process neuro-networks for our microcontrollers.
Any feedback conserning this project please?8 -
Monday morning, we were told by our teacher that we had one week to create a clustered system with virtual machines , handled with 2 hypervisors, and the whole thing must come with high availability
These are the kind of stuff that make me doubt about becoming devops later, 3 days in and I'm only starting to get what we're doing, but I'm such a massive dead weight for the rest of my group 😵😵 -
This is a part rant-part question.
So a little backstory first:
I work in a small company (5 including me) which is mostly into consultation (we have many tech partners where we either resell their products or if there is a requirement from one of our clients, we get our partners to develop it for them and fulfill the client requirements) so as you can see there is a lot of external dependencies. I act as a one-hat-fits-all tech guy, handling the company websites, social media channels, technical documentation, tech support, quicks POCs (so anything to do with anything technical, I handle them). I am a bit fed up now, since the CEO expects me to do some absurd shit (and sometimes micro manages me, like WTF I am the only one who works there with 100% commitment) and expects me to deliver them by yesterday.
So anyway long story short, our CEO finally had the brains to understand that we should start having our own product (which i had been subtly suggesting him to do for a while now!).
Now he came up with a fairly workable concept that would have good market reach (i atleast give him credits for that) and he wanted me to suggest the best way to move forward (from a both business and technical point of view). The concept is to have an auction-based platform for users to buy everyday products.
I suggested we build a web app as opposed to a mobile one (which is obvious, since i didnt want to develop a seperate website and a mobile app, and anyway just because we can doesnt mean we have to make a mobile app for everything), and recommended the Node/react based JS tech stack to build it.
At first he wanted me to single handedly build the whole platform within a month, I almost flipped (but me being me) then somehow calmed down and finally was able to explain him how complicated it was to single-handedly build a platform of such complexity (especially given my limited experience; did I mention that this is my first job and I am still in college, yeah!!) and convinced him to get an experienced back-end dev and another dev to help me with it.
Now comes the problem, I was to prepare a scope document outlining all the business and technical requirements of the project along with a tentative cost, which was fairly straightforward. I am currently stuck at deciding the server requirements and the system architecture for the proposed solution (I am thinking of either going with AWS - which looks a bit complicated to setup - or go with either Digital Ocean or Heroku):
I have assumed that at peak times we would have around 500-1000 users concurrently
And a daily userbase of 1000 users (atleast for the first few months of the platform running)
What would be the best way forward guys?
I did some extensive (i mean i read through some medium blogs! and aws documentation) research and put together the following specs (if we are going through AWS):
One AWS t3.medium ec2 instance for the node server (two if we want High Availability by coupling with the AWS load balancer and Elastic Beanstalk)
The db.t3.small postgres database
The S3 Storage bucket (100gb) for the React Front end hosting
AWS SNS for email/sms OTP and notification
And AWS CloudMonitor for logging amd monitoring.
Am I speculating the requirements properly, where have I missed??
Can u guys suggest what is the best specification for such a requirement (how do you guys decide what plan to go with)?
Any suggestions, corrections, advices are welcome3 -
I had a discussion with my colleagues about my bachelor thesis.
Together we created within the last 18 month a REST-API where we use LDAP/LMDB as database (tree structured storage). Of course our data is relational and of course we have a high redundancy there. It's a 170 call API and I highly doubt that it's actually conforming REST.
Ensuring DB integrity is done in the backend and coding style there is "If we change it at one place, let's make sure to also change it everywhere else", so you get a good impression how much of spaghetti code we have there.
Now I proposed to code a solution in my bachelor thesis where we use a relational database (we even have an administrated Oracle DB with high availability) and have a write-only layer to also store the data in LDAP but my colleagues said that "it would add too much complexity to the system".
Instead I should write the relational layer myself and fetch the data somehow from the existing LDAP tree.
What the actual fuck, spaghetti code is what makes the system really unnecessarily complex so that no one will understand that code in 2 years.
Congratulations, you just created legacy code that went into production in 2018 while not accepting the opportunity to let that legacy code get eliminated.
Now good luck with running and maintaining that system and it's inconsistencies.1 -
{
-i won't follow logging practices
-i won't follow secure coding
-i won't leverage profiling n monitoring tools
-i won't reuse best practices
-i won't listen to thought leaders
-i will outsource writing UT
-i will outsource code quality checks
-i will outsource all testing
-i will ignore n overide CTO team
But I still want high stability, security n 4 9s availability. Just want it done. My team is best. Am a fast-track leadership program leader who never has or ever needs to cod. I just know ...
}
People I have to deal with every sprint. Site reliability is not easy ...
Teaching good code makes great products to morons, toughest ...
"Beginners mind needed"2 -
Got my ActiveMQ-Zookeeper Replicated LevelDB setup finished! All provisioned with Ansible. so happy :) needed to share. anyone else like setting up high availability stuff?9
-
I've been working for over a year now in this remote job as a sysadmin for a local client. I personally find this job quite intimidating at first with all of the infrastructure and all of its many microservices running in high availability set up. I enjoyed learning everything about them and why it's been set up this way, which gives me ideas if I were to build my own app (not competing with my current employer, of course).
But now I don't feel comfortable managing this beast in its many environments.
From time to time, I would hear from my old colleagues at my old sucky company for help in their work and that they know I'm an expert in. I help and it makes me feel good.
Now I'm at a career dilemma. I don't want to lose my current job because I feel "uncomfortable" with managing and administrating the tech holding the whole infrastructure. And I don't wanna go back to my old job with the sucky pay and the feel of being unchallenged. And if I try to find another job, I might be as lucky as I do now, especially good difficult it is for me to find a remote job to begin with.
Objectively, I just need to clear off my debts (at this rate, in 4 years), and have a side income to support my family. But I don't think I can follow through on that plan. Should I look for a new job or do better with the current job that I have now?3 -
Ok so.
You know you have to deal with annoying things when you take on a guard duty role and yes, we signed up for it because of the mullah.
However, you also want to do this with a reliable and robust monitoring and alerting systemthat you can depend on! And no i am not going to advertise a product for this... What i will tell you is which one to avoid.
Meet Quest "Foglight" ... It does EVERYTHING! It monitors, it alerts, it does trend watching it does fancy shmancy graphics, it does reporting, it is very extendable... WAUW, right! right?
Well, if you were stuck somewhere in 2005-2010 maybe... But this fucklight is cutting short on EVERYTHING
Today , i got called up at 3:30 in the morning (i am typing this after the incident) because this shit of a system has "HIgh Availability" by basically letting the FMS server suck each others jaggons and hope it somehow respons. This is a sort of keepalived thing, but on proprietary java tech..
Oh, yes, it's written on java and... yes.. Java 6
This means that, effectively we are running RHEL5 machines (yes, RHEL 5!!!) because something more modern in place? nope.
I have no idea anymore what i am ranting about, i'm tired, i'm tired of this shit, i'm tired of getting called up just because of some dude has been cussing up a sales representative, sucked each others jaggons and pushed the federal goverment with a shit solution for almost a decade now.
Fuck Foglight
Fuck Quest software, because did you really think you would get enterprise level support for an enterprise product which you payed enterprise euro's for it? You are so naive, how cute...
And consequently : Fuck Dell and Good job Dell.. For purchasing quest software, mess around with it, and then dump it back to the market... Srsly Dell , you were like me when i had this hot ass chick as a girlfriend but later seemed to be too crazy to justifiably tolerate compared to her hotness. Dump it like it's trump.
Oh, and, wauw! Foglight graced us with a successful startup process after .. what.. 6 times restarting? In 2 hours... With 12 CPU's and 128 GB ram and .... oh fuck this you don't deserve such resources.4 -
What is the best alternative to cronjobs, guaranteeing high availability and jobs not being duplicated?6
-
Today, I began learning about the wonders and horrors of HA in production environments.
My head feels like when I first joined the company as a total noob who never worked an IT job in his life. Soooooooo much new information and concepts and potential issues to learn to avoid.
But all super interesting!