Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "sev1"
-
Gather around folks, I'll paint you a nice picture based on a true story, back from my sysadmin days. Listen up.
It's about HP and their Solaris 5.4/6 support.
- Yet another Prod Solaris dinosaur crashed
- Connected to console, found a dead system disk; for some reason it was not booting on the remaining redundant disk...
- Logged an HP vendor case. Sev1. SLA for response is 30 minutes, SLA for a fix is <24 hours
- It took them 2 days to respond to our Prod server outage due to failed system disks (responses "we are looking into this" do not count)
- it took another day for them to find an engineer who could attend the server in the DC
- The field engineer came to the DC 4 hours before the agreed time, so he had to wait (DC was 4-5 hours of driving away from HP centre)
- Turns out, he came to the wrong datacentre and was not let in even when the time came
- We had to reschedule for two days later. Prod is still down
- The engg came to the DC on time. He confirmed he had the FRU on him. Looks promising
- He entered the Hall
- He replaced the disk on the Solaris server
- It was the wrong disk he replaced. So now the server is beyond rebuild. It has to be built anew... but only after he comes back and replaces the actually faulty disk.
- He replaced that disk on the wrong Solaris server2 -
My coleague's story
- before leaving after long day at the office final look at support cases (after official support hours)
- sev1 ticket logged an hour ago, noone called us (although should have; after support hours)
- angry manager calls and demands to get in touch with the client immediately (we're already after support hours, FTS should pick the case, not us)
- we reach out. Customer has business-impacting case
- after initial info gathering: some cert got expired, they got a new one and placed it in the app's directory. The app still does not work
- the first question we ask: "are you sure you have placed it in the right directory?"
- "yes, we are sure. No problems there" - answers a voice with indian accent
- noone finds the root cause for hours.
- It's already 1am
- someone from client's specialists comes up with an idea: "are we sure the cert is in the right place? Let's try to move it to the same directory the old one was in the first place"
- .................................................
- production is working again
- "Why didn't anyone from support suggest this?!?!"
- .................................................
- 2am. Case solved, manager is informed everything's allright now.
- In the morning we get yelled at by the manager bcz we supposedly missed a sev1 ticket and were incompetent during the conf. call
This reminds me why I stay away from support. And why I started hating people. And why I do not work with indians (our ways are too different for me to stay sane and not to kill anyone).3 -
IT again: Spoke too soon about a happy server farm after Christmas... Had a SEV1 complete outage for the whole morning. *facepalm*2
-
Severity 1 issue on the company today... No back ends for the full country... It was even exciting, don't aak why.