[#201] prove that signalc has at least 5x higher throughput than signald
closes #201 (closed)
(Note: subsequent optimizations, such as introducing a write-through redis cache on the critical path of the Signal protocol store and optimizing the thread pool used by the executor in libsignal code, regularly yielded throughput of 1000 messages in ~2sec, for an increase in speedup from 5x to 50x.)
context
- in prod, signald often takes up to 30 minutes to deliver all messages to channels with several thousand subscribers
- in this MR, we build a load testing framework to replicate the load of a channel with 1,000 subscribers and use it to demonstrate that signalc can transmit 1,000 messages in under a minute, therby validating the decision to continue working toward shipping it
results
conditions
- Tested w/ X1 Carbon w/ 8 virtual cores (Intel i7 gen 8), 16gb RAM w/ midling internet connection (9.65 mbps up / 98.19 mbps down)
- Ran 5 trials for each scale, observed throughput decrease with each run and stabilize after that. Kept last-generated report for each trial series.
- Note: used a socket pool between signalboost and signald of 1 b/c signald does not send messages in parallel and because higher levels cause it to drop messages and break off socket connections due to concurrent modification errors
summary
signalc outperformed signald by a factor increasing from ~3x
to ~5x
at increasing scales:
- for 10 recipients: .676 sec vs. 1.94 sec (2.9x)
- for 100 recipients: 2.82 sec vs. 13.8 sec (4.9x)
- for 1000 recipients: 28.7 sec vs. 138 sec (4.8x)
details
Signalc Message Lag in Sec for 10, 100, 1000 recipients:
[{
"client": "SIGNALC",
"numRecipients": 10,
"socketPoolSize": "16",
"timestamp": "2021-03-30T19:36:02.655Z",
"percentDelivered": 100,
"minElapsed": 0.233,
"maxElapsed": 0.672,
"meanElapsed": 0.359,
"variance": 0.439,
"totalElapsed": 0.676
},
{
"client": "SIGNALC",
"numRecipients": 100,
"socketPoolSize": "16",
"timestamp": "2021-03-30T19:36:43.947Z",
"percentDelivered": 100,
"minElapsed": 0.329,
"maxElapsed": 2.792,
"meanElapsed": 1.337,
"variance": 2.463,
"totalElapsed": 2.815
},
{
"client": "SIGNALC",
"numRecipients": 1000,
"socketPoolSize": "16",
"timestamp": "2021-03-30T20:22:10.832Z",
"percentDelivered": 100,
"minElapsed": 0.924,
"maxElapsed": 28.407,
"meanElapsed": 13.461,
"variance": 27.483,
"totalElapsed": 28.668
}]
Signald Message Lag (in sec) for 10, 100, 1000 recipients:
[{
"client": "SIGNALD",
"numRecipients": 10,
"socketPoolSize": "1",
"timestamp": "2021-03-30T20:33:28.560Z",
"percentDelivered": 100,
"minElapsed": 0.128,
"maxElapsed": 1.935,
"meanElapsed": 1.033,
"variance": 1.807,
"totalElapsed": 1.938
},
{
"client": "SIGNALD",
"numRecipients": 100,
"socketPoolSize": "1",
"timestamp": "2021-03-30T20:32:29.054Z",
"percentDelivered": 100,
"minElapsed": 0.224,
"maxElapsed": 13.872,
"meanElapsed": 7.316,
"variance": 13.648,
"totalElapsed": 13.896
},
{
"client": "SIGNALD",
"numRecipients": 1000,
"socketPoolSize": "1",
"timestamp": "2021-03-30T20:49:23.024Z",
"percentDelivered": 100,
"minElapsed": 0.137,
"maxElapsed": 44.842,
"meanElapsed": 25.79,
"variance": 44.705,
"totalElapsed": 138.125
}]
usage
- provision a signal server by recreating the files in
signal-server
, runningdocker-compose up -d
from project root - reseed the test harness by running
make load.reseed
- spin down the harness with
make load.down
- run lag tests against signalc or signald with
make test.lag.signalc
ormake test.lag.signald
(respectively) - change the size of a lag test trial by adjusting the
numBots
variable insimulator/constants
code changes
infrastructure
- add a fake signal server
- runs at signalserver.signalboost.info
- automatically approves all register/verify requests
- does not enforce any rate limits
- we inject its LE root cert into signalc & signald's trust store to enable them to send TLS traffic to it
test harness
- create seed data (mostly sql dumps) that live in
simulator/seed-data
. can boostrap:- a signalc intance w/ 10 "sender" (channel) numbers - for test subject
- a signald instance w/ 10 "sender" (channel) numbers - for test subject
- a signalc instance w/ 1000 "receiver" (subscriber) numbers - to simulate load for test subject
- a singal server w/ 1020 accounts for all of the above
- add a
simulator
node app that can drive signalc or signalc over a unix socket and run tests against it:. contains:-
signal
module w/ minimal subset of singalboost's features needed to register numbers, send messages from them, and subscribe to messages to them) -
constants
module w/ hooks for subscribing to a subset of seed data -
testLag
module to drive load tests for lag and print out a report
-
- add a
docker-compose
file and several make scripts (notablymake load.test.lag.signalc|d
) that can be used to spin up areceiver_signalc
instance (driven by asender_driver
instance ofsimulator
) sending traffic to either asender_signalc
orsender_signald
instance driveen bysender_driver
instance ofsimulator
application
- to optimize from 40sec to 20sec, we made these changes:
- add a db connection pool (to avoid making postgres run out of connections under load)
- prefer IO dispatchers throughout the stack
- bump the maximum number of threads allowed in the IO dispacther executor to num processors * 64
- introduce backpressure in front of calls to SignalSender#send to relieve conention over threads while messages are in flight (do this via a set of queues that operate in parallel but each only process one message at a time, such that there is an upper bound on how many messages may be in flight at any given moment -- roughly equal to the number of processors times 2 plus 1)
- to gain visibility into libsignal logs, we added a
LibSignalLogger
implementation - to stay current with upstream (and seek potential perf gains) we bumped the version of libsignalservice-java on which we depend
Edited by aguestuser