[#326] Resolve "shard channels across multiple signald instances"
Closes #326 (closed)
context
- we have some really big channels now. because we have not yet solved message lag in the signald layer, that means that broadcast messages can take a very long time (minutes) to send on large channels
- currently, when really big channels send broadcast messages and incur lag, all channels (including very small ones) become "clogged" and experience the same lag as the large channel.
- in effect signalboost becomes degraded for all users any time a big channel is used. in other words: our message routing currently suffers from "failure of fairness"
- as a remedy, we wish to shard channels into their own signald instances so that large channels don't down slow down smaller channels and medium channels don't slow down each other
- this MR provides an implementation of sharding that groups channels evenly into 10 buckets (by subscriber size) every time signalboost starts up, then assigns each channel in that bucket to the same signald socket, thereby achieving sharding and (we hope!) recovery of fairness
changes:
-
assign channels to socket pools in boot sequence
-
use
socketId
in all calls tosignal.sendMessage
-
add convenience method for getting channel's socket pool id
-
update docker configs to support multiple socket pools
-
log to p8s gauges in sharding.assignChannelsToSocketPools
data model updates:
- add socketId field to channels, w/ a convenience setter func in repository layer
- add
channelRepository.getChanelsSortedBySize
to assist with sharding algo- returns list of channelPhoneNumber, channelSize tuples
sharding job:
- on startup, run a new job that assigns all channels a new
socketId
based on their size- these ids will be used to determine what socket signalboost uses to subscribe to and send messages, and hence groups channels into shards based on their socketId
- the job calls
socket.sharding#assignChannelsToSockets
, which:- queries channels sorted by size (desc)
- tries to distribute channels by member count as evently as possible into a fixed set of socketId "buckets", using a min Heap to more efficiently determine the "least full bucket" on each iteration
- after grouping, it assigns same socket id to every channel in a bucket and emits some prometheus metrics about the composition of the shards (channels-per-socket, members-per-socket, max-sized-channel-on-socket)
signal/socket layer:
- when
socket.run
is invoked in the boot sequence, it now connects to 10 different socket descriptors (conciding with 10 differentsocketId
values stored onchannel
records) - when
signal
functions callsocket.write
, they now pass thesocketId
associated with the channel sending a message (or 0 for system calls)
new channel creation:
- by default new channels will be assigned socket id
0
, which should put them in the smallest shard - if they grow really big (and therefore no longer should belong in the smallest shard), they will be assigned to a different shard the next time signalboost restarts, which will be at most 24 hours
docker config changes:
- create 10 signald containers by specifying 10 services with names
signald_0
throughsignald_9
- map the 10 socket descriptors addressed by signalboost at
/var/run/signald-sockets/<N>
to named volumessignald_sock_<N>
, then map those named volumes to/var/run/signald
in thesignald_<N>
container - extract shared docker configs to
docker-compose
and add a lot of new anchor tags for the 10 signald containers - rely on
docker-compose
's native merging logic to merge configs fromdocker-compose.yml
base layer intodocker-compose-dev.yml
anddocker-compose.overrides.yml
for dev and prod configs, respectively- modify all of our dev makefiles to pass both base file and
-dev.yml
override to scripts that used to just take the-dev.yml
file as arguments to-f
- lean on the fact that
docker-compose
implicitly will loaddocker-compose.yml
thendocker-compose.overrides.yml
to get away with not having to pass arguments to-f
in prod (docker-compose up
works just fine!) - note that yaml anchors do not natively support merging lists, which is why we defer providing the
signald_sock_0
volume variants until the dev and prod compose files (b/c if we tried to do it in the base file, the other volumes would be overridden by the socket volumes or vice versa: the two lists cannot be merged. however,docker-compose
can merge lists, so we lean on it instead of native yml parsing logic. oof.)
- modify all of our dev makefiles to pass both base file and
modify restart logic to work with multiple signald instances:
- in
execute
:- stop all signald instances when shutting down
- await
isAlive
responses from all signald instances when restarting
- in
socket
:- pass
socketPoolId
as arg todispatch
- broadly helpful b/c: it seems useful to know what socket id an incoming message is comging from
- specifically: this allows us to issue per-instance
isAlive
calls and handle their responses by identifying which socket they are sent on
- pass
- in
dispatcher.index
:- add
dispatcher.dispatcherOf(socketId)
to allow passing socket id as (curried) argument to all invocations ofdispatcher.dispatch
- pass
socketId
tocallbacks.handle
(enabling retrieval ofisAlive
callback from registry, since we can now identify it by the socket it arrived on)
- add
minor things to make it all work:
- modify prometheus to scrape 10 signald containers at new subdomains
- update signald urls in .env and prometheus scrape job configs
side-effects:
- perform cheaper query for channels in
signal.run
: don't query for nested associations of channels, just the channels! - only run healthchecks every 5 min in dev (to make logs cleaner)
- set socket pool to 1 (less balls in the air as we're measuring how this all plays out!)
Edited by aguestuser