Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

etcd has succeeded as a piece of distributed systems infrastructure beyond our wildest expectation. When Alex Polvi, Xiang Li, and I started the project as a README in the summer of 2013 we identified that their still was no consensus database that was developer friendly, easily secured, production ready, and based on a well understood consensus algorithm. And largely we got lucky with good market timing, the invention of the Raft algorithm, and the explosion of good tooling around the Go language. This lead to early success as etcd was used in locksmith, skydns, and vulcan load balancer.

As the years went on we got lucky again that the Kubernetes project chose etcd as its primary key-value database. This helped to establish the project as a must use piece of infrastructure software. Which went on to influence the technology selection of storage, database, networking, and many other projects. Just check out all of the stickers of projects relying on etcd that I could find at KubeCon here in Seattle. https://twitter.com/BrandonPhilips/status/107370136987218739...

For a sense of all of the projects that use checkout this list we maintain in the project: https://github.com/etcd-io/etcd/blob/master/Documentation/in...

Some notable projects include: Kubernetes, Rook, CoreDNS, Uber M3, Trillian, Vitess, TiDB, and many many others.

Moving into the CNCF will help to bring a few things to the project:

- Funding and resources to complete regular third-party security audits and correctness audits

- On-call rotation and team for the discovery.etcd.io system

- Assistance in maintaining a documentation website

- Resources to fund face to face meetup groups and maintainer meetings

As a closing remark I want to thank the over 450 contributors and the entire maintainer team for bringing the project to this point. We are solving an important distributed systems problem with a focused piece of technology.

In fact, in Seattle this week we all got together as a maintainer team for the first time ever: https://twitter.com/sp_zala/status/1073239003330015233

If you want to learn more about the history of the project checkout this other post blog post: https://coreos.com/blog/history-etcd



It was a lot of fun watching the project and community evolve. I think you and the team did an excellent job. I remember a huge spike in users around when discovery.etcd.io launched. It was really a game changer for us building large-scale multi-data center telecom systems. I still remember bootstrapping the first cluster in a 24 data center test and having things blow up, particularly in higher-latency environments (cross-DC)

Fast-forward 4 months, the project had grown and scaled to support the influx of new curious devs and use-cases that stretched the bounds of what was possible at the time. At the end of the 4-months, we had a 128 node cluster that stayed up for years and still powers all of the emergency notifications in a few states in the US!


Woah! I would love to get this testimonial in our production users doc!

https://github.com/etcd-io/etcd/blob/master/Documentation/pr...


Docker Swarm Mode also embeds etcd.

(The embedding mechanism is copy-paste, which I find both ingenious and a bit distasteful. Maybe I’m just sore I didn’t think of it first)


I wrote the initial implementation of the raft subsystem and it was definitely not a copy/paste. We started from scratch (using etcd's core raft) with the transport layer being grpc. My initial experiment could be found in this repository [1]. I then took the code from my initial experiment and included this into Swarmkit [2]. From there we went through many iterations on the initial code base and improved the UI with Docker swarm `init`/`join`/`leave` to make the experience of managing the cluster "friendly".

We spent quite some time evaluating different raft and paxos implementations (mostly Consul and etcd raft libraries), and found out etcd to be the most stable and flexible for our use case. It was very easy for example to swap the transport layer to use grpc. The fact that etcd implementation is represented as a simple state machine makes it also much easier to reason about under complex scenarios for debugging purposes, instead of digging into multiple layers of abstractions.

In retrospect, this came with quite a learning curve. We've had to deal with issues caused by our own misunderstandings on how to use the library properly. At the same time the fact that the developers favored stability as opposed to user friendliness was exactly what we found attractive using etcd's raft. Additionally, CoreOS developers were super friendly and helpful to help us fix these issues. We've reported and fixed some bugs as well. Kudos to them for all the help they provided at the time.

[1] https://github.com/abronan/proton [2] https://github.com/docker/swarmkit/commit/89de50f2092dfd2170...


I apologise for my misunderstanding.

What I remember is, during DockerCon in June 2016, I went into the code to see how it worked, and I found a top-level file setting up data structures and handlers that seemed to be 90% the same as the equivalent file in etcd. And the underlying implementation is reused via vendoring.

Maybe this rings a bell with you and you can tell me what I saw, because I can't find it now.

Maybe I dreamed the whole thing.

I did, and still do, think integrating etcd into Swarm Mode was a masterstroke; we had spent the previous two years working to avoid "first you must install etcd" in a different way that nobody got. Afterwards we created kubeadm to ape the 'init' and 'join' functionality.


Are you sure? I’ve spent quite some time playing with the internals of Docker Swarm / swarmkit last year and I’m quite confident it wasn’t true then. As far as I know they call go-raft directly because they only need a fraction of the features offered by etcd.


It uses etcd/raft from beginning.


It is indeed work that you and your team should be proud of.

Any thoughts on rkt?


rkt was needed to push a number of ideas forward in the ecosystem at the time (4 years ago, 2014) and part of its legacy is the creation of technologies that provided plugin interfaces for the container ecosystem.

The Container Networking Interface was directly created by the work in rkt and continues on today inside of Kubernetes and the CNCF. This work made it possible for an ecosystem of networking solutions to exist that could take advantage of everything Linux has to offer.

The creation of the Kubernetes Container Runtime Interface (CRI) was also spawned, in part, by the existence of rkt and the need to consider container runtimes for use with Kubernetes. It was a long hard engineering effort but I think the separation that CRI forced the kubelet to go through and the competition of various runtimes is good for the ecosystem and the resilience of the Kubernetes project.

It is very unlikely that rkt will be part of the Kubernetes ecosystem at this point with the existence of containerd, and CRI-O as Kube CRI solutions on Linux. And there were missed opportunities on a variety of fronts along the way. But, rkt continues to be used by many organizations for other niche use cases of containers. And the shifts that rkt caused above were positive improvements for the Kubernetes ecosystem.


Thanks for the thoughtful reply.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: