Skip to content

[hotfix] remove restart job

aguestuser requested to merge hotfix-remove-restart-job into master

context:

  • now that we have healthchecks in place, it is safer to not auto-restart all the time, because we have an automated way of detecting outages and fixing them on a case-by-case basis
  • this creates an opportunity to measure how open outages actually happen without restarting (or whether restarting is actually a good fix when they happen)
  • if restarts are not necessary, we would prefer to avoid them, since they sometimes result in messages being dropped
  • for now: we are going to disable restarts and monitor for several days the impact that has on outages.
  • if after 5 days we don't see a lot of outages, we will keep the restarts disabled. if we do see a spike in outages, we will restore the restart job, but perhaps at a less frequent rate of every 3 hours (which job is left commented out in the current crontab)

Merge request reports