Auto-healing deployments are great until your deployment fails because it can't handle sending a large number of push notifications (triggered by a message on a pub/sub topic that starts being read from on startup)...so restarts, tries to send them again, fails, restarts, etc...and meanwhile a few tens of thousands of users are getting spammed by notifications 😱

  • 0
    Sounds like you need batching/partitionjng one way or the other. Are you getting a message with all the messages in one message?
  • 2
    @SortOfTested the pub/sub message is just a "tickle" that tells the backend to check for any pending notifications, and to send them if there are any
    The actual sending of the notifications went fine (that already gets batched into groups of 500 to comply with Firebase's requirements)
    Turns out the end issue was with the code that was marking all of the notifications as "delivered"; a boolean was being flipped on each of the entities loaded from the DB, and then persisted back...except apparently the ORM layer uses regexes under the hood, and it seems like Javascript regexes have a maximum length which was being exceeded, which caused Node to crash...and since the notifications were never marked as "delivered", when the next pub/sub message came along (once per minute), the same thing happened again
  • 1
    *sad trombone*
  • 0
    "auto-healing deployment" sounds like you run your backend as a systemd type service and enable it to run on boot after the network is up. Is that right? Am I missing something, it sounds more complicated than just that.
  • 0
    @justamuslimguy deployments are handled by Kubernetes...not sure if that makes them more or less complicated than systemd wizardry haha
Add Comment