19
Root
4y

Finally finished the screwdriver followup ticket. I think.

I spent almost two full days (14 hours) on a seemingly simple bug on Friday, and then another four hours yesterday. Worse yet: I can’t test this locally due to how Apple notifications work, so I can only debug this on one particular server that lives outside of our VPN — which is ofc in high demand. And the servers are unreliable, often have incorrect configuration, missing data, random 504s, and ssh likes to disconnect. Especially while running setup scripts, hence the above. So it’s difficult to know if things are failing because there’s a bug or the server is just a piece of shit, or just doesn’t like you that day.

But the worst fucking part of all? The bug appeared different on Monday than it did on Friday. Like, significantly different.

On Friday, a particular event killed all notifications for all subsequent events thereafter, even unrelated ones, and nothing would cause them to work again. This had me diving through the bowels of several systems, scouring the application logs, replicating the issue across multiple devices, etc. I verified the exact same behavior several times over, and it made absolutely no sense. I wrote specs to verify the screwdriver code worked as expected, and it always did. But an integration test that used consumer-facing controller actions exhibited the behavior, so it wasn’t in my code.

On Monday while someone else was watching: That particular event killed all notifications but ONLY FOR RELATED EVENTS, AND THEY RESUMED AFTER ANOTHER EVENT. All other events and their notifications worked perfectly.

AKL;SJF;LSF

I think I fixed it — waiting on verification — and if it is indeed fixed, it was because two fucking push event records were treated as unique and silently failing to save, run callbacks, etc.

BUT THIS DOESN’T MATCH WHAT I VERIFIED MULTIPLE TIMES! ASDFJ;AKLSDF

I’m so fucking done with this bs.

Comments
  • 3
    Here, scream
  • 1
    Holly shitstorm.
    I use custom metrics for BS like this. For mobile apps - I twisted the fuck out of google analytics.
  • 1
    @ScriptCoded you do have a death wish, huh..?
  • 4
    What braindead retard set up that server when even SSH just disconnects like that?
  • 2
    @Condor SysOps 🤷🏻‍♀️

    Thankfully AWS manages most aspects of the production servers so they’re more reliable. At least as far as I know? I have zero visibility into prod.
  • 0
    Root, I doubt you’re finished with that screwdriver lol. The best way for you to end this is to take a real screwdriver and write racist slang all over your product owner’s Tesla that he bought with his million dollar salary (that he gained from all his brilliant contributions that include....um....most meetings attended 😁). His car will go viral across all social media and he’ll resort to hiding in the woods to avoid the mobs after him. He’ll
    end up eating squirrels to survive and wiping his ass with leaves after taking a nasty dump foul enough to scare a bear
  • 0
    @Root in that case I think you'd be a better sysadmin than them actually 😉
  • 0
    sounds like two different bugs to me
Add Comment