queue and resend rate-limited messages
Context
- On channels with large memberships, we are starting to observe somewhat frequent occurences of rate-limiting by signal.
- When messages are rate-limited, signald does not attempt to resend them, it just drops them and emits an error message.
- This means that ~5% of intended recipients of some messages don't get them! Obviously really bad!
Value
- As a signalboost admin
- I want to know that every message I send to my channel will reach every subscriber
- So that I don't have to worry about who knows what, or think twice before using signalboost to send important information
Behavior
TLDR
enqueue rate-limited messages for resending with exponential backoff to ensure messages get through.
resending
GIVEN a message that has failed to send due to rate limit error for the first time
- THEN signalboost will attempt to resend it in 2 seconds
exponential backoff
GIVEN a message that has been rate-limited for the Nth time
- THEN signalboost will attempt to resend it in 2^N seconds
resend canceling
GIVEN a message that has been rate limited 8 times [1]
- THEN signalboost will not attempt to resend the message anymore
[1] ie: 8.5 minutes have elapsed since the first send attempt and the last resend delay was 4.25 minutes (256 sec) long
Implementation Details
- create
resendInterval
andmaxResendInterval
config vars- set
resendInterval
to 2 sec andmaxResentInterval
to 256 sec (2^8 sec, 4.25 min)
- set
- create a
resendQueue
variable that is local to thedispatcher.run
module and closed over by the (already existing)detectRateLimitedMessage
function-
resendQueue
is a hashmap with keyshash(messageBody + username + recipientNumber)
and valuesResendableMessage
-
hash
is some appropriately fast hashing algorithm. possibly sha1, possibly something simpler -
ResendableMessage
is an object with fieldssdMessage: SdMessage
andlastResendInterval: number
-
- every time that
detectRateLimitedMessage
detects a rate limited message from signald it:- takes the hash of the sender, recipient, and content of the
SdMessage
in therequest
field of the erorr message - tries to find an entry in the
resendQueue
matching that hash
- takes the hash of the sender, recipient, and content of the
- IF NO ENTRY MATCHING HASH, it:
- enqueus the message for resending in 2 seconds
- creates a new entry in the
resendQueue
using the (already taken) hash as the key, and setting thelastResendInterval
field of theResendableMessage
in the value to 2
- IF ENTRY MATCHING HASH FOUND, it:
- checks to see if the
lastResendInverval
is >=maxResendInterval
- IF EXCEEDS MAX: it:
- does not attempt to resend the message
- deletes the corresponding entry from the
resendQueue
- IF DOES NOT EXCEED MAX, it:
- multiplies the
lastResendInterval
by 2, producingnewResendInterval
- queues the failed message for resending in
newResendInterval
seconds (using wait(newResendIterval).then(...)) - mutates the
lastResendInterval
value of the given message's entry in theresendQueue
to the value ofnewResendInterval
- multiplies the
- checks to see if the
Open questions:
- the
resendQueue
could maybe be a redis store (or some other k/v store) instead of just a hash map- resons i could see for doing this would include performance and avoiding race conditions on writes to any given k/v pair
- since we dont' have evidence this store will be taxed by performance yet, (and a hash map is lighter/faster to ship) i don't think perf is persuasive
- since there is no reason to expect contention/race conditions over individual k/v pairs (hahses are unique for channel/subscriber/message combinations and only ever read-from/written-to by one function at a time in widely spaced intervals) i don't see an argument for needing something more concurrency friendly
- hence, let's try a hash map for now and optimize later if it proves necessary
Edited by aguestuser