Skip to content

[#201] prove that signalc has at least 5x higher throughput than signald

aguestuser requested to merge 201-load-tests into main

closes #201 (closed)

(Note: subsequent optimizations, such as introducing a write-through redis cache on the critical path of the Signal protocol store and optimizing the thread pool used by the executor in libsignal code, regularly yielded throughput of 1000 messages in ~2sec, for an increase in speedup from 5x to 50x.)

context

  • in prod, signald often takes up to 30 minutes to deliver all messages to channels with several thousand subscribers
  • in this MR, we build a load testing framework to replicate the load of a channel with 1,000 subscribers and use it to demonstrate that signalc can transmit 1,000 messages in under a minute, therby validating the decision to continue working toward shipping it

results

conditions

  • Tested w/ X1 Carbon w/ 8 virtual cores (Intel i7 gen 8), 16gb RAM w/ midling internet connection (9.65 mbps up / 98.19 mbps down)
  • Ran 5 trials for each scale, observed throughput decrease with each run and stabilize after that. Kept last-generated report for each trial series.
  • Note: used a socket pool between signalboost and signald of 1 b/c signald does not send messages in parallel and because higher levels cause it to drop messages and break off socket connections due to concurrent modification errors

summary

signalc outperformed signald by a factor increasing from ~3x to ~5x at increasing scales:

  • for 10 recipients: .676 sec vs. 1.94 sec (2.9x)
  • for 100 recipients: 2.82 sec vs. 13.8 sec (4.9x)
  • for 1000 recipients: 28.7 sec vs. 138 sec (4.8x)

details

Signalc Message Lag in Sec for 10, 100, 1000 recipients:

[{
  "client": "SIGNALC",
  "numRecipients": 10,
  "socketPoolSize": "16",
  "timestamp": "2021-03-30T19:36:02.655Z",
  "percentDelivered": 100,
  "minElapsed": 0.233,
  "maxElapsed": 0.672,
  "meanElapsed": 0.359,
  "variance": 0.439,
  "totalElapsed": 0.676
},
{
  "client": "SIGNALC",
  "numRecipients": 100,
  "socketPoolSize": "16",
  "timestamp": "2021-03-30T19:36:43.947Z",
  "percentDelivered": 100,
  "minElapsed": 0.329,
  "maxElapsed": 2.792,
  "meanElapsed": 1.337,
  "variance": 2.463,
  "totalElapsed": 2.815
},
{
  "client": "SIGNALC",
  "numRecipients": 1000,
  "socketPoolSize": "16",
  "timestamp": "2021-03-30T20:22:10.832Z",
  "percentDelivered": 100,
  "minElapsed": 0.924,
  "maxElapsed": 28.407,
  "meanElapsed": 13.461,
  "variance": 27.483,
  "totalElapsed": 28.668
}]

Signald Message Lag (in sec) for 10, 100, 1000 recipients:

[{
  "client": "SIGNALD",
  "numRecipients": 10,
  "socketPoolSize": "1",
  "timestamp": "2021-03-30T20:33:28.560Z",
  "percentDelivered": 100,
  "minElapsed": 0.128,
  "maxElapsed": 1.935,
  "meanElapsed": 1.033,
  "variance": 1.807,
  "totalElapsed": 1.938
},
{
  "client": "SIGNALD",
  "numRecipients": 100,
  "socketPoolSize": "1",
  "timestamp": "2021-03-30T20:32:29.054Z",
  "percentDelivered": 100,
  "minElapsed": 0.224,
  "maxElapsed": 13.872,
  "meanElapsed": 7.316,
  "variance": 13.648,
  "totalElapsed": 13.896
},
{
  "client": "SIGNALD",
  "numRecipients": 1000,
  "socketPoolSize": "1",
  "timestamp": "2021-03-30T20:49:23.024Z",
  "percentDelivered": 100,
  "minElapsed": 0.137,
  "maxElapsed": 44.842,
  "meanElapsed": 25.79,
  "variance": 44.705,
  "totalElapsed": 138.125
}]

usage

  • provision a signal server by recreating the files in signal-server, running docker-compose up -d from project root
  • reseed the test harness by running make load.reseed
  • spin down the harness with make load.down
  • run lag tests against signalc or signald with make test.lag.signalc or make test.lag.signald (respectively)
  • change the size of a lag test trial by adjusting the numBots variable in simulator/constants

code changes

infrastructure

  • add a fake signal server
  • runs at signalserver.signalboost.info
  • automatically approves all register/verify requests
  • does not enforce any rate limits
  • we inject its LE root cert into signalc & signald's trust store to enable them to send TLS traffic to it

test harness

  • create seed data (mostly sql dumps) that live in simulator/seed-data. can boostrap:
    • a signalc intance w/ 10 "sender" (channel) numbers - for test subject
    • a signald instance w/ 10 "sender" (channel) numbers - for test subject
    • a signalc instance w/ 1000 "receiver" (subscriber) numbers - to simulate load for test subject
    • a singal server w/ 1020 accounts for all of the above
  • add a simulator node app that can drive signalc or signalc over a unix socket and run tests against it:. contains:
    • signal module w/ minimal subset of singalboost's features needed to register numbers, send messages from them, and subscribe to messages to them)
    • constants module w/ hooks for subscribing to a subset of seed data
    • testLag module to drive load tests for lag and print out a report
  • add a docker-compose file and several make scripts (notably make load.test.lag.signalc|d) that can be used to spin up a receiver_signalc instance (driven by a sender_driver instance of simulator) sending traffic to either a sender_signalc or sender_signald instance driveen by sender_driver instance of simulator

application

  • to optimize from 40sec to 20sec, we made these changes:
    • add a db connection pool (to avoid making postgres run out of connections under load)
    • prefer IO dispatchers throughout the stack
    • bump the maximum number of threads allowed in the IO dispacther executor to num processors * 64
    • introduce backpressure in front of calls to SignalSender#send to relieve conention over threads while messages are in flight (do this via a set of queues that operate in parallel but each only process one message at a time, such that there is an upper bound on how many messages may be in flight at any given moment -- roughly equal to the number of processors times 2 plus 1)
  • to gain visibility into libsignal logs, we added a LibSignalLogger implementation
  • to stay current with upstream (and seek potential perf gains) we bumped the version of libsignalservice-java on which we depend
Edited by aguestuser

Merge request reports