We need to model how much state we do want to maintain in the client side for the blobs uploads.
has a given blob been uploaded yet? yes/no? why? network errors? retries needed?
on the download side, what to do if we try to retrieve metadata for a document that the other replica didn't upload yet? can we use the related documents at all?
I think we need to settle on very simplistic assumptions at first, just in order to deploy an MVP. But we need to be aware of the problems that are going to come later on on a multidevice scenario, and have at least some simplistic failure mode available.
blobs are immutable (there's no replace or delete for now).
one soledad doc points at most to one blob (this can easily be changed).
blob metadata is stored inside a soledad doc's content.
there is a separate component called "blobs client" (or blobs manager) that queries soledad and handles actual download/upload queues and local storage/retrieval of blobs data (no metadata handling here).
blob metadata is stored inside a soledad doc's content
I'm not sure how much I like this. This means that for each new doc we save the metadata doc, we try to upload the blob, and then we modify the document and queue it for upload (or push it in an alternate future not so tied to the u1db rest api).
The advantage is that... a remote replica can know for sure that an upload was successful (instead of having to guess based on dangling refs). Something else?
Probably it's just a matter of quantifying the overhead.
encrypt outgoing, decrypt incoming (...)
queues have file descriptors to read/write from/to.
this is a good candidate to be represented as a Tube
The advantage is that... a remote replica can know for sure that an upload was successful (instead of having to guess based on dangling refs). Something else?
That is the advantage i see, but I also don't understand for sure what would be an alternative. Do you mean not having the information about uploaded state in the metadata? Suppose the client crashes, how would it gather a list of blobs to upload, if the uploaded state information is not stored in the metadata doc?
In reply to some transmogrified bits by gitlab@0xacab.org on the 2017-02-20 16:23:36:
The advantage is that... a remote replica can know for sure that an uploadwas successful (instead of having to guess based on dangling refs).Something else?
That is the advantage i see, but I also don't understand for sure what would be
an alternative. Do you mean not having the information about uploaded state in
the metadata? Suppose the client crashes, how would it gather a list of blobs
to upload, if the uploaded state information is not stored in the metadata doc?
From the perspective of the replica-1 (the sender), the metadata might just be
put in some temporary tables of the BlobManager.
blob_id queued for upload
blob_id failed to upload
blob_id retried to upload n times
blob_id successfuly uploaded
Once the blob is successfully uploaded, we can erase that status data.
With this alternative in mind, I'm not sure if it makes sense to pollute the doc
json with the status info (has the overhead of re-uploading the doc after upload
has been accomplished).
From the perspective of the replica-2 (the downloader), stuff gets more
complicated. Let's say this one has just finished a metadata sync, and the
application code (mail) is trying to fetch a mail. We have several options, in
the server:
Refuse to serve (return 404, or something more appropiate, 409?). This means
that client must retry, and wait until replica-1 finishes its upload.
we could probably accomplish this by copying over the incoming stream to some
staging area first.
Dispatch the GET and register this request as a consumer for the ongoing
producer (The one that is reading from the socket that's doing the PUT from
the replica-1). I think this is totally possible and used by some of the file
sharing services out there. However, if the upload from replica-1 is
corrupted/stalled the download will also be.
An update: !88 (closed) adds the ability to get attachment state. That is currently naivelly calculated by listing local and remote attachments, and looking up the name of the attachment in these lists. That can be improved by, eg, adding a field on local database.
it would be nice for the board view if blocked was for work happening now. if it's an issue for the future then during scoping of cycles, we can decide if its blocked at that point. does that work? can i remove the blocked label?
@kali@drebs I think this was implemented on crash recovery (retry
strategy and refactor).
I'm not sure I follow. The definition of done for this issue was a spec
for transfer commited to the docs folder, has that been done?
You're free to reject the proposed ticket, but the use of "implemented"
on a documentation issue confuses me.