Skip to content

[#326] Resolve "shard channels across multiple signald instances"

Closes #326 (closed)

context

  • we have some really big channels now. because we have not yet solved message lag in the signald layer, that means that broadcast messages can take a very long time (minutes) to send on large channels
  • currently, when really big channels send broadcast messages and incur lag, all channels (including very small ones) become "clogged" and experience the same lag as the large channel.
  • in effect signalboost becomes degraded for all users any time a big channel is used. in other words: our message routing currently suffers from "failure of fairness"
  • as a remedy, we wish to shard channels into their own signald instances so that large channels don't down slow down smaller channels and medium channels don't slow down each other
  • this MR provides an implementation of sharding that groups channels evenly into 10 buckets (by subscriber size) every time signalboost starts up, then assigns each channel in that bucket to the same signald socket, thereby achieving sharding and (we hope!) recovery of fairness

changes:

  • assign channels to socket pools in boot sequence

  • use socketId in all calls to signal.sendMessage

  • add convenience method for getting channel's socket pool id

  • update docker configs to support multiple socket pools

  • log to p8s gauges in sharding.assignChannelsToSocketPools

data model updates:

  • add socketId field to channels, w/ a convenience setter func in repository layer
  • add channelRepository.getChanelsSortedBySize to assist with sharding algo
    • returns list of channelPhoneNumber, channelSize tuples

sharding job:

  • on startup, run a new job that assigns all channels a new socketId based on their size
    • these ids will be used to determine what socket signalboost uses to subscribe to and send messages, and hence groups channels into shards based on their socketId
  • the job calls socket.sharding#assignChannelsToSockets, which:
    • queries channels sorted by size (desc)
    • tries to distribute channels by member count as evently as possible into a fixed set of socketId "buckets", using a min Heap to more efficiently determine the "least full bucket" on each iteration
    • after grouping, it assigns same socket id to every channel in a bucket and emits some prometheus metrics about the composition of the shards (channels-per-socket, members-per-socket, max-sized-channel-on-socket)

signal/socket layer:

  • when socket.run is invoked in the boot sequence, it now connects to 10 different socket descriptors (conciding with 10 different socketId values stored on channel records)
  • when signal functions call socket.write, they now pass the socketId associated with the channel sending a message (or 0 for system calls)

new channel creation:

  • by default new channels will be assigned socket id 0, which should put them in the smallest shard
  • if they grow really big (and therefore no longer should belong in the smallest shard), they will be assigned to a different shard the next time signalboost restarts, which will be at most 24 hours

docker config changes:

  • create 10 signald containers by specifying 10 services with names signald_0 through signald_9
  • map the 10 socket descriptors addressed by signalboost at /var/run/signald-sockets/<N> to named volumes signald_sock_<N>, then map those named volumes to /var/run/signald in the signald_<N> container
  • extract shared docker configs to docker-compose and add a lot of new anchor tags for the 10 signald containers
  • rely on docker-compose's native merging logic to merge configs from docker-compose.yml base layer into docker-compose-dev.yml and docker-compose.overrides.yml for dev and prod configs, respectively
    • modify all of our dev makefiles to pass both base file and -dev.yml override to scripts that used to just take the -dev.yml file as arguments to -f
    • lean on the fact that docker-compose implicitly will load docker-compose.yml then docker-compose.overrides.yml to get away with not having to pass arguments to -f in prod (docker-compose up works just fine!)
    • note that yaml anchors do not natively support merging lists, which is why we defer providing the signald_sock_0 volume variants until the dev and prod compose files (b/c if we tried to do it in the base file, the other volumes would be overridden by the socket volumes or vice versa: the two lists cannot be merged. however, docker-compose can merge lists, so we lean on it instead of native yml parsing logic. oof.)

modify restart logic to work with multiple signald instances:

  • in execute:
    • stop all signald instances when shutting down
    • await isAlive responses from all signald instances when restarting
  • in socket:
    • pass socketPoolId as arg to dispatch
    • broadly helpful b/c: it seems useful to know what socket id an incoming message is comging from
    • specifically: this allows us to issue per-instance isAlive calls and handle their responses by identifying which socket they are sent on
  • in dispatcher.index:
    • add dispatcher.dispatcherOf(socketId) to allow passing socket id as (curried) argument to all invocations of dispatcher.dispatch
    • pass socketId to callbacks.handle (enabling retrieval of isAlive callback from registry, since we can now identify it by the socket it arrived on)

minor things to make it all work:

  • modify prometheus to scrape 10 signald containers at new subdomains
  • update signald urls in .env and prometheus scrape job configs

side-effects:

  • perform cheaper query for channels in signal.run: don't query for nested associations of channels, just the channels!
  • only run healthchecks every 5 min in dev (to make logs cleaner)
  • set socket pool to 1 (less balls in the air as we're measuring how this all plays out!)
Edited by aguestuser

Merge request reports