Once you've gzipped to achieve that 3MB storage, binary deltas are useless. Perh...

comex · on May 8, 2017

Technically, the Cargo /already/ bundles a full copy of libxdelta as part of libgit2 (in addition to the separate Git binary delta algorithm); I just checked using nm that it's actually included in the binary. It could probably be removed, but, well, it probably adds a lot less than 44MB to the binary size :)

Alternately, since JSON is text, I suppose you could just ensure that whatever emits this hypothetical merged JSON file puts newlines between different packages' entries, and then use a regular text diff (on the uncompressed version, of course). But reading 44MB of JSON isn't instant; it would probably be better to switch to either a binary format, or even something silly like a sorted list of JSON strings separated by newlines.

There would be some incidental complexity around generating and applying the diffs… you'd probably want to precalculate them on the server side, but it could be rather expensive to, on every change, calculate a diff between the current version and every previous change. Instead, you could have daily checkpoints: each day the server would make a checkpoint and calculate a diff to the last N checkpoints; on every update the server would recalculate the diff between the latest checkpoint and HEAD. The client would store both HEAD and a reverse diff to the latest checkpoint (or just store the checkpoint separately and waste a few MB), so when it updates, it could revert to that checkpoint and request the diff from there to the new latest checkpoint; it would also request the diff from the checkpoint to the new HEAD. If its checkpoint is too old then it would just redownload from scratch.

Overall, not a trivial change, but probably not too hard either.

apt-get does something vaguely similar with its pdiff files.