This are some notes we made to standardize the naming we give for the data transfer features.
Chunking:
- Suppose a 100 MB file and a configured maximum chunk size of 1 MB.
chunks = lambda content: content.split_in_chunks(MAX_CHUNK_SIZE)
- The 100 MB file is represented as 100 chunks of 1MB.
- Chunking is implemented in the application layer and does not relate to network.
Batching:
- There are N documents to send (N can be configured, like
sum([size(b) for b in blobs]) < X
). - Send in a batch means that all documents are sent in only one (HTTP) request.
Streaming:
- Choose a value Y KB for max chunk size.
- Suppose we need to transfer a total of X MB (> Y) from multiple blobs or even all of them.
- Client and server memory will never go above Y KB while sending X MB.
- The other end parses back the single stream of data into the original multiple blobs that were used as input.