[376] compartmentalize auto-restarts by socket shard
context
- we would like to be more resilient to restarts so that we can push concurrency more heavily under load (which often triggers concurrency errors that crash a channel and trigger an auto-restart)
- this MR introduces logic for auto-restarts that compartmentalizes each restart to the socket shard in which the channel failure occured
- upshot: the system is more resilient to restarts b/c fewer channels overall are affected by any given restart
changes:
- if healthchecks fail, group fatal failures by socket id, pass them to refactored
_restartAndNotify
(accepts socket id and failed channel number), which passes them to refactoredrestart
-
restart
now does not shut down entire app, merely unsubscribes from channels in shard, aborts correct signald instance, restarts correct socket pool (using new convenience methods onapp
), then re-subscribes to channels - there is some fancy lodash footwork involved. perhaps we can simplify later! :)
- side-effect: make healthchecks happen less frequently in dev to get cleaner logs
Closes #376 (closed)