Skip to content

[477] fix prekey bundle bug

Closes #477 (closed)

context

symptom:

  • in load test env, we observe that messages to signal users who have never sent a message back to a channel are always interpreted as PREKEY_BUNDLE envelopes rather than CIPHERTEXT envelopes. (in dev env we have observed that messages from such users are also always interpreted as PREKEY_BUNDLEs
  • why we care: our fix for refreshing prekeys in #404 (closed) rests on being able to confidently infer that receipt of a PREKEY_BUNDLE envelopes indicates a completely new session with a user with whom an account has never exchanged messages. (Which would mean it has depleted our store of prekeys by asking for one to start a new session with us, making this a good hook to go check whether we should refresh our prekeys!)

likely cause

  • in libsignal, SessionCipher#encrypt's call to SessionBuilder.process looks (in native code) for session_state.unacknowledged_pre_key_message_items() when determining whether to encode the data it is about to encrypt as a CiphertextMessage or a PreKeyMesssage
  • this collection is only ever set to empty by #decrypt (which would have received an acknowledgement of a prekey bundle from an interlocutor)
  • however, we are failing to set these collections to empty in most cases, because we don't decrypt envelopes of types other than CIPHERTEXT. notably: we do not decrypt messages of type RECEIPT that are the most likely ways that most interlocutors would signal acknowledgement of a prekey bundle message
    • as such, we never realize that our prekey bundle message has been acknowledged and we never clear the flag

fix

  • this MR begins by re-establishing the 2 enum variants for PREKEY_BUNDLE and CIPHERTEXT (that i had previously collapsed because i did not properly understand the failure behavior in which i only ever saw PREKEY_BUNDLE messages in the load tests and assumed that since they were handled like CIPHERTEXT messages, they must also be CIPHERTEXT messages
  • it then proceeds to reimplement SignalSender#drop to always decrypt all messages so that we will have a chance to observe and react to acknowledgment of prekey bundles from interlocutors at the earliest possible moment (and not waiting for them to send us a message -- which they might never do! -- to do so.)
  • in turn it establishes confidence that we are always correctly sending/receiving PREKEY_BUNDLE envelopes when we should

side-effects

  • fix up our docker setup as follows:
    • decouple the build and push steps for signalc docker build
    • introduce a dev tag and tag all "pinned" image builds with it
    • use this dev tag in loadtest and signalc dev docker-compose stacks (so we don't have to manually update the pinned commit every time we make a change)
    • provisionally eliminate any usage of the old signalc_dev image (now all docker-compose files simply use signalc which is the image former specified as signalc_pinned)
      • if we ever want to use signalc_dev again for faster iterations we can simply revert 5e8ce6b4
  • modify the max prekey id to follow what signal android uses (an int smaller than Int.MAX_VALUE)

further work

in the course of QA-ing this work, i realized a few things:

  • in the absence of this bugfix, all new messages sent to a new channel would produce an InvalidMessageException upon receipt:
openwhispersystems.libsignal.InvalidMessageException: Message from +<REDACTED>:1 failed to decrypt; sender ratchet public key 0910fa5a7b3e3f6c1ae450c7fab44975a038fbfa662935bcb606155e3d1cda25 message counter 1
signalc_pinned    | Candidate session 0 failed with 'invalid ciphertext message', had 0 receiv
  • this was surprising to me, since i am not sure why decrypting messages (that were never sent) would cause the message to go away. however, the only way i can get that bug to disappear is by introducing the call to decrypt on every message
  • notably, even in the presence of this bugfix, the (unanticipated) symptom disappears, but the original context (in the loadtest receiver) has a new bug. namely, it never proceeds past receiving PREKEY_BUNDLEs because it always throws the same InvalidMessageException as above
  • why... we might ask does receiving (or sending -- also tested!) a first message to an actual phone work okay, but sending one to a bot running singalc fails? where did this sneak bug sneak in during the last month of work? all great questions, which are deferred until #478 to solve :)
Edited by aguestuser

Merge request reports