Skip to content

[#366] auto-restart signalboost after fatal healthcheck failures

Closes #366 (closed)

context:

  • under heavy load signald can crash channels more than we would like
  • we have good monitoring/alerting to tell us when this happens, but we are not always available to see the alerts and restart signalboost
  • this MR modifies healtcheck code to automatically restart signalboost anytime it detects that a channel has failed 2 consecutive healthchecks (notifying maintainers that it has done so)

changes:

  • extract diagnostics.restart from execute module, call it from executeMaybeRestart if authentication succeeds
  • add logic to sendHealthchecks to call restart if any healthchecks fail twice in a row (and send notifications to maintainers before and after restart -- and on error, if restart fails)

side-effects:

  • shorten healthceck intervals and timeouts in development env to make QA'ing failed healtcheck behavior easier
Edited by aguestuser

Merge request reports