My team is currently using the hosted cituscloud, which uses PG Bouncer. They note that the reason for sharding is because of high writes. We've actually seen big benefits for moving over to a sharded setup via Citus just as much for the read performance.
By sharding by customer we're able to more effectively leverage Postgres caching and elastically scale. Since switching over, our database has performed and scaled much better than when we were on a single node.
There is some up front migration work. But that's limited some minor patching of activeRecord ORM, and changes to migration scripts. We considered casandra, elastic search, as well as dynamoDB, but the amount of migration changes in the application logic would take an unacceptable amount of time.
Craig from Citus here. Thanks for the kind words Justin, as you mention there is some upfront work, but once it's in place it becomes fairly manageable.
One of the things that I'm now excited about for citus is multi node IO capacity. It should improve write and read latencies in the situation where a single node disk is getting saturated.
* Edit. I AM excited for this. Typo'd now = not previously
Let's say that you're using RDS and have a single box capable of ~25k IOPS. If you have 16 boxes capable of some number of IOPS (let's say 15k), then your total system capacity is significantly higher than 25k. On read workloads where pages need pulled from disk (high read IOPS), this should see improvement in general.
Secondaries cover this case for Gitlab, it seems, but that comes with a set of caveats as well (namely async availability of data).
> Secondaries cover this case for Gitlab, it seems, but that comes with a set of caveats as well (namely async availability of data).
Another caveat is that all secondaries will end up having more or less the same stuff in their cache. As your data set grows bigger that becomes unsustainable, because you're going to read from disk more and more.
When you shard across N nodes you can keep N times as much data in the cache. Combined with N times higher I/O and compute capacity, that can actually give you way more than N times higher read throughput than a single node (for data sets that don't fit in memory), and you can get much higher write throughput as well.
By sharding by customer we're able to more effectively leverage Postgres caching and elastically scale. Since switching over, our database has performed and scaled much better than when we were on a single node.
There is some up front migration work. But that's limited some minor patching of activeRecord ORM, and changes to migration scripts. We considered casandra, elastic search, as well as dynamoDB, but the amount of migration changes in the application logic would take an unacceptable amount of time.