Praise for the blobstore 📯
I’m writing to highlight what appears to be a fundamental role played by the blobstore pattern. This may be aspirational, but I’m curious to explore.
Caveat: I only have experience using blobstores and the related infra I mention below, eg CDNs, source control, document stores, CI, etc, not maintaining this infra.
I’ve bolded references to other key pieces of infrastructure.
This is one of the more primitive, intuitive applications. Push a local file to the store and then retrieve it:
$ curl -T my_file https://blobs.example.com/my_file $ curl https://blobs.example.com/my_file
Enable custom domains and we have general hosting:
$ curl -H "x-custom-domain: mydomain.com" https://blobs.example.com/my_file $ curl https://mydomain.com/my_file
Put a cache in front of the store and we have a CDN. Use SSDs for hosting and we may not need a cache. Or use an external CDN and the blobstore as an “origin server”.
Key-value stores scale well because they require relatively little coordination: a key maps simply to a host and doesn’t need to be read before being written.
We improve hosting scalability by building on a key-value pattern, for example.
For better or worse, a blobstore is only aware of simple keys, so we can’t browse it like a file structure.
However, split the path and index the fragments and we have a document store. We can keep the blob store pure by defining the document store as standalone infrastructure and depending on an event bus and execution service:
- Document store subscribes to blobstore push events via event bus associating an executable
- Client pushes value to blobstore
- Blob store writes value and fires event, eg “push
- Document store processes the path as described above
Now we can look up values by path fragments:
$ curl -T my_file https://blobs.example.com/my_bucket/my/file $ curl https://documents.example.com/my > file
Using an approach similar to building a document store, we can process the values stored in blobstore to populate a search index.
Put a review process and commit log in front of writes to the blobstore and you have scalable source control. Use a blobstore for Git’s object store, and you get scalable replication. Use a document-aware source control mechanism, like Google, and you get a scalable mono-repo.
Blobstore data is often immutable and versioned, so commits could simply be references to file versions.
Subscribe to push events from source control, using an event bus and execution service, and we have CI. We can listen for success events from our subscriber to get pre-submit quality control. We can write artifacts back to the blobstore as part of a continuous deploy pipeline.
We’ve already described a few cases of stream processing via an event bus, but we can also periodically iterate over values and stream them through an execution service to rebuild indices, produce artifacts, normalize data, etc.