[#366] auto-restart signalboost after fatal healthcheck failures
Closes #366 (closed)
context:
- under heavy load signald can crash channels more than we would like
- we have good monitoring/alerting to tell us when this happens, but we are not always available to see the alerts and restart signalboost
- this MR modifies healtcheck code to automatically restart signalboost anytime it detects that a channel has failed 2 consecutive healthchecks (notifying maintainers that it has done so)
changes:
- extract
diagnostics.restart
fromexecute
module, call it fromexecuteMaybeRestart
if authentication succeeds - add logic to
sendHealthchecks
to callrestart
if any healthchecks fail twice in a row (and send notifications to maintainers before and after restart -- and on error, if restart fails)
side-effects:
- shorten healthceck intervals and timeouts in development env to make QA'ing failed healtcheck behavior easier
Edited by aguestuser