Digging Into etcd

OSS:kubernetes::etcd:

December 1st, 2019

⚠ Warning
You've discovered a draft post! 🚧
This entry is still under construction and shouldn't be listed anywhere ... 🤔

What Is etcd?

etcd, /ˈɛtsiːdiː/, per the official site is:

A distributed, reliable key-value store for the most critical data of a distributed system

Per the FAQ etcd’s name means “distributed etc directory”. With etc being a reference to the Unix directory for system-wide configuration /etc, and d being a reference to “distributed” ¹. The d is perhaps also a pun on the long history of naming daemons with a d suffix (see: httpd, ntpd, systemd, containerd, …), though I’ve not yet found proof of this.

Kubernetes uses etcd as the backing store for cluster data, which drove my own interest in learning more about it. Clearly a lot of clusters out there are using etcd for critical data storage, but how does it work? Why etcd?

History

For a history of etcd see: coreos.com/blog/history-etcd (Archived)

For bonus points: www.wired.com/2013/08/coreos-the-new-linux/ (Archived)

Roughly: etcd was created out of a desire for a distributed data store addressing the following issues:

Google’s Chubby Paper was public, but Chubby itself was / is not.
Zookeeper was expensive to run, didn’t scale down, and couldn’t be interacted with via common simple tools like curl.

Initially etcd was developed by coreOS for their fleet container orchestration system, but it was quickly adopted for other uses and later donated to the CNCF.

Architecture

Overview

etcd is a Go binary with a seperate CLI (etcdctl).
etcd exposes a gRPC service along with an HTTP JSON API.
Data is persisted in multiversion key-value format, stored on disk.
Typically one, three, or five replicas are used.
Each replica stores the full dataset, following the leader.

Data Model

etcd’s upstream data model documentation is instructive here, I highly recommend reading this document.

Storage

Data is stored with a memory-mapped B+ tree using bbolt, a fork of bolt, inspired by LMDB.

Consensus

Leader election is used to maintain a single leader replica, all requests are routed to the leader internally and comitted only after acheiving consensus on the request.

Raft is the consensus algorithm used both requests and for leader elections. The official raft site is a a good reference for understanding how this works. Another great resource linked from the official site is thesecretlivesofdata.com/raft/.

etcd’s raft implementation is widely used and contains some useful documentation.

TODO

elaborate on Kubernetes’s usage
talk more aboult multiversion and revisions
talk more about data model
talk more about supported operations

Additional Resources

The Carnegie Mellon Database Group “Database of Databases” site has a great page on etcd at dbdb.io/db/etcd

As more clearly evidenced an in older version of the etcd docs:

The name “etcd” originated from two ideas, the unix “/etc” folder and “d"istributed systems. The “/etc” folder is a place to store configuration data for a single system whereas etcd stores configuration information for large scale distributed systems. Hence, a “d"istributed “/etc” is “etcd”.

This “Why etcd” page doesn’t exist in currently supported versions. ↩︎

Digging Into etcd

# What Is etcd?

# History

# Architecture

# Overview

# Data Model

# Storage

# Consensus

# TODO

# Additional Resources