11
2erXre5
7y

Living on the edge!

One or two years ago I managed to deploy a DDL change directly on the production server. As I knew there was a backup job which will run every day at noon and at midnight. So I run my script some minutes after noon. So far so good. But somehow I tested it badly in my test environment and the UI of the application throws error after error now in production.

Well, just revert the db to the latest recovery point with the backup, I thought.

It became clear then after a couple of minutes of searching the backup folder for the db backup that there was no such file. The youngest backup file was 3 years old.

Now what happened: The backup script had a switch "simulate=true" and then simulated a successful backup on each run. Therefore the monitoring system got no alerts for not correctly executing those jobs correctly. Then the monitoring job which should do the backupfolder surveillance stuck with green, because there was a valid backup file inside. But it did not check for a specific creation date.

Now this database is the one we need for doing our daily business and is really crucial. Therefore It was easier to emergencyfix the application than doing a rollback of the db 🙄

Well, not really a data loss story, but close to one.

Comments
Add Comment