Skip to content

[hotfix] increase and move suspension in middle of shard restart

aguestuser requested to merge hf-adjust-restart-interval into main
  • in restart job we suspend to make sure that signald containers have are listening on sockets before we try to send messages over them
  • we used to (1) tell signalboost to reconnect to the socket, (2) wait for 5 sec, (3) send a subscribe message
  • but this resulted in errors in which (1) or (3) tried to connect to the socket from before the restart
  • instead, we now (1) wait for 15 sec, (2) reconnect to the socket, (3) send subscribe messages
  • works on dev, let's see on prod!

side-effects:

  • make for better logging and make sure that a full manual restart always returns and notifies even if some shards are empty

Merge request reports