Skip to content
Snippets Groups Projects

#9007 - improve and document IBlobsBackend

Merged drebs requested to merge drebs/soledad:9007 into master

This MR includes a refactor of the blobs filesystem backend in a way that:

  • all interactions with twisted.web.server.Requests are made in the blobs resource itself, and are not passed to the backend.
  • all methods now return deferreds.
  • documentation is added on how to implement a new backend.

Other info:

Edited by drebs

Merge request reports

Pipeline #7269 failed

Pipeline failed for 383a19aa on drebs:9007

Approval is optional

Merged by leap-code-o-maticleap-code-o-matic 7 years ago (Dec 18, 2017 2:04pm UTC)

Merge details

Pipeline #7419 passed

Pipeline passed for 383a19aa on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
73 93 def __touch(self, path):
74 94 open(path, 'a')
75 95
76 def read_blob(self, user, blob_id, request, namespace=''):
96 def read_blob(self, user, blob_id, namespace=''):
77 97 logger.info('reading blob: %s - %s@%s' % (user, blob_id, namespace))
78 path = self._get_path(user, blob_id, namespace)
98 try:
99 path = self._get_path(user, blob_id, namespace)
100 except Exception as e:
101 return defer.fail(e)
79 102 logger.debug('blob path: %s' % path)
80 _file = static.File(path, defaultType='application/octet-stream')
81 return _file.render_GET(request)
103 fd = open(path)
  • Contributor

    Did you check if it will close later? The old one received a path and handled everything. Shouldn't we at least add a comment to remember that it's caller responsibility to close?

  • Contributor

    Also, from Python docs: If mode is omitted, it defaults to 'r'. Since it was written with wb we may need to use rb here?

  • Contributor

    btw, I liked the definition of read as something that returns a file descriptor, just would like to make sure it will be closed and that we warn properly.

  • Author Contributor

    In our use case it is being closed because the producer inside the custom BlobFile resource closes the file when stopProducing is called. But I agree with you that we should add that recommendation to the docstring.

  • changed this line in version 11 of the diff

  • Please register or sign in to reply
  • 251 292 pass
    252 293
    253 294
    295 class BlobFile(resource.Resource):
    296
    297 def __init__(self, fd):
    298 self.fd = fd
    299 self.fd.seek(0, 2)
    300 self.size = self.fd.tell()
    301 self.fd.seek(0)
    302
    303 def render_GET(self, request):
    304 request.setHeader(b'content-length', intToBytes(self.size))
    305 request.setHeader(b'content-type', 'application/octet-stream')
    306 request.setResponseCode(http.OK)
    307 producer = static.NoRangeStaticProducer(request, self.fd)
    • Contributor

      Why NoRangeStaticProducer instead of File? It doesn't support range requests. The File does. This will be necessary for resuming downloads, see #8989 (closed)

    • Author Contributor

      So File is a resource that uses NoRangeStaticProducer, SingleRangeStaticProducer, or MultipleRangeStaticProducer, which are producers, depending on the content of the Range` header.

      I just stuck that one there because I didn't want to copy the whole File.makeProducer method, but we can/must do it to properly serve file ranges.

      Do you have a suggestion on how to better handle this?

    • Contributor

      If I understood it correctly, we can't use static.File instead of BlobFile because read_blob returns an open fd right? If we extract the path from the returned fd and use the static.File this won't be applicable to other backends, as I see it, may be wrong.

      Thinking on it for a while, if all backends are supposed to be Twisted based, then we may be better off with a method that given a consumer (or a method to be called with partial data), connects a producer to it. I imagined this because if we go to CouchDB, then it could be a treq.collect to a CouchDB request calling the given method. In the case of a file, it could be FileBodyProducer doing the same. Anyway, I am struggling to see a way to support Content-Range on both kinds of backends simultaneously through the same interface.

      Since we are kind of mixing 3 issues here (#9008 #8989 (closed) #9007 (closed)), I would go for the easiest that fit most and solve this later when we get a spec for the new backend (if we decide to add it). The easiest I see for covering #8989 (closed) and #9007 (closed) is to use static.File as before in the place of BlobFile, then adapting it later for the new backend. This could either be a direct usage of static.File or subclassing it on BlobFile to get File.makeProducer method. What do you think?

    • Contributor

      3rd option, since there are more two MRs depending on this: merge as it is and solve the static.File dilemma as part of #8989 (closed)

    • Author Contributor

      Victor @shyba commented about 4 hours ago:

      If I understood it correctly, we can't use static.File instead of BlobFile because read_blob returns an open fd right?

      There are some issues here:

      • read_blob was receiving a request, and doing it's thing to serve the blob over that request. For that, it was using twisted.web.static.File, because in the fs backend case, the blob was a file.
      • it seemed reasonable to me to separate the request handling from the backend data handling, and that means having some way to pass data back and forth without using the request.

      If we extract the path from the returned fd and use the static.File this won't be applicable to other backends, as I see it, may be wrong.

      Not all file descriptors have paths associated with them, only the ones that actually represent a file in the filesystem.

      Thinking on it for a while, if all backends are supposed to be Twisted based, then we may be better off with a method that given a consumer (or a method to be called with partial data), connects a producer to it. I imagined this because if we go to CouchDB, then it could be a treq.collect to a CouchDB request calling the given method. In the case of a file, it could be FileBodyProducer doing the same.

      I think the producer/consumer approach is more abstract and more twisted-like than passing file descriptors around. I will come up with something for that and ping you.

      Anyway, I am struggling to see a way to support Content-Range on both kinds of backends simultaneously through the same interface.

      What do you mean by "both kinds of backends? Couch and Filesystem? Where is the difficulty?

      Since we are kind of mixing 3 issues here (#9008 #8989 (closed) #9007 (closed)), I would go for the easiest that fit most and solve this later when we get a spec for the new backend (if we decide to add it). The easiest I see for covering #8989 (closed) and #9007 (closed) is to use static.File as before in the place of BlobFile, then adapting it later for the new backend. This could either be a direct usage of static.File or subclassing it on BlobFile to get File.makeProducer method. What do you think?

      I still don't see a way of using static.File directly. Is your idea to use fd.name on the Resource level? If we do that we would be assuming that the fd does represent a file in the filesystem, which would break the generality of the solution, wouldn't it?

      3rd option, since there are more two MRs depending on this: merge as it is and solve the static.File dilemma as part of #8989 (closed)

      I can rebase the other MRs over master, that is not a problem.

      One question for you: will you be touching the IBlobsBackend for implementing content-ranges? If yes, maybe it'd be better for you to pick up this branch from the current state and then adapt the interface to what makes most sense for that solution. Does that make sense?

    • changed this line in version 7 of the diff

    • Please register or sign in to reply
  • Victor
  • drebs mentioned in merge request !170 (merged)

    mentioned in merge request !170 (merged)

  • drebs changed the description

    changed the description

  • drebs marked as a Work In Progress

    marked as a Work In Progress

  • drebs added 27 commits

    added 27 commits

    • 33addaba...1c533260 - 2 commits from branch leap:master
    • 271c06c6 - [doc] add api doc for blobs backend
    • 4ba18283 - [refactor] remove unneeded class
    • 7c4f755e - [feature] add an exception for blobs not found
    • c6d8f9d8 - [refactor] make blobs backend get_tag() agnostic of twisted.web requests
    • c6ab16f3 - [refactor] make blobs backend count() agnostic of twisted.web requests
    • 64435d08 - [refactor] make blobs backend get_flags() agnostic of twisted.web requests
    • b02bc4bc - [refactor] make blobs backend list_blobs() agnostic of twisted.web requests
    • 202630bd - [refactor] make blobs backend read_blob() agnostic of twisted.web
    • b8922389 - [refactor] make blobs backend set_flags() agnostic of twisted.web requests
    • 6383189b - [refactor] make blobs backend write_blob() agnostic of twisted.web requests
    • 03879f10 - [refactor] make blobs backend delete_blob() agnostic of twisted.web requests
    • 8cd55761 - [refactor] make delete_blob() return a deferred
    • 1e4cbe25 - [refactor] make get_blob_size() return a deferred
    • 308ed481 - [refactor] make count() return a deferred
    • d8e8c3c5 - [refactor] make list_blobs() return a deferred
    • 690b888e - [refactor] make get_total_storage() return a deferred
    • 4a0b2bb9 - [refactor] make get_tag() return a deferred
    • 2c42af6d - [refactor] make get_flags() return a deferred
    • 2f1f127b - [refactor] make set_flags() return a deferred
    • 1820b8ad - [refactor] make read_blob() return a deferred
    • 615514fd - [doc] improve documentation on IBlobsBackend
    • ed05d944 - [bug] fix exception catching with new blobs backend interface
    • b3535c72 - [bug] handle path exceptions using twisted failures
    • 750398d8 - [test] fix incoming test with the new async blobs backend
    • cfc7e476 - [refactor] use producer/consumer on write/read_blob respectivelly

    Compare with previous version

  • drebs added 1 commit

    added 1 commit

    • e7d780bd - [REMOVE-ME] test benchmarks

    Compare with previous version

  • drebs added 1 commit

    added 1 commit

    • 67f514e8 - [refactor] use producer/consumer on write/read_blob respectivelly

    Compare with previous version

  • drebs added 1 commit

    added 1 commit

    • aa9bd136 - [refactor] use producer/consumer on write/read_blob respectivelly

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading