The id of documents may leak information
Soledad is a document store, and its documents have 3 fields: id, revision, and content. The fields id and revision are used by soledad sync protocol (both on client and server side) to compare documents and decide if versions superseed and if there are conflicts.
The content field is client-encrypted before being sent from to the server. Currently, the id and the revision are sent unmodified.
Problem: if users of the soledad api decide to set the id to arbitrary values, they may be leaking information to the server about the content of the document. One current example bitmask's mail implementation: it sets the id of a document based on the type of its content (if it represents a header, message flags, content, etc). With knowledge of the document ids, the server is able to learn if an attachment is the hash of a known mime-part, for example.
There are 2 possible approaches to this:
leave the implementation as is, and make it clear to the users of soledad api that it is their responsibility to avoid leaking information through the id. This would imply having to change the bitmask mail implementation, if we really want to avoid such a leakage there.
modify soledad to hide the original id and revision in some way. Currently, there are 2 proposals for this:
replace the id for a hash, and store the original id somewhere inside the encrypted document content. This has the downside of having to store content in "tombstone" (deleted) documents, which usually have their content set to @None@.
replace the id for an encrypted version, something in the lines of "iv:encrypted_id". This has the downside of having to come up with a way to derive one encryption key for each document without knowledge of the original document id, so a fresh install will be able to decrypt all documents successfully.
We should first decide if we will care for this and then decide for a solution if it is the case.
(from redmine: created on 2016-12-08)