A while ago, wrote these very nice G+ posts
- German – Konsenssysteme – Zookeeper, etcd, consul – was soll das sein?
- English – Distributed Discovery Systems: Zookeeper, etcd, Consul
The English text is about a year older, but the German text Google Translates pretty well.
The most important points in ot for me were these:
- Consensus systems are distribute systems, so take at least the P (partitioned) from the CAP theorem.
- In addidtion, Consensus systems also chose the C (consistent) from the CAP theorem.
- Since in CAP you can only pick 2 out of 3, the A (available) isn’t guaranteed on Consensus systems.
- Only three systems get this right: Zookeeper, etcd, Consul. All others shred data eventually.
- Leader election algorithms Paxos and Raft.
- Cluster a.k.a. Ensemble provide a consistent view of the data no matter to what member of the Cluster/Ensemble you talk to
- The (set of) connection(s) from a client to the Cluster/Ensemble is called session
- Cluster/Ensemble operations are on a tree with nodes that can have atomic operations on them
- Nodes can be persistent or ephemeral (temporal)
- All nodes can have data (keep it small enough ~4kilobyte max)
- Directories in the tree are usually persistent; leaf nodes often ephemeral
- Useful operations: load balancing, queueing, data availability
- There are transactions so you can make atomic operations larger. Don’t make them too long.
- Consistency takes time; expect at max ~1000s of write operations per second
- Not being available is a feature (it means it still is P and C, just not reachable right now)
- Clients must cope with the Cluster/Ensemble being temporarily being read-only or unavailable
- Applications should always re-create any persistent nodes they create (just in case – during non availability – from one consistent phase to another consistent phase) a persistent node is no more.
Some more keywords and links from the article:
- Cloud, Overlay, dependencies, instances, service discovery
- Source of truth and consensus what this truth is and who the source is
- Jepsen tests distributed aspects: https://aphyr.com/tags/jepsen
- Open source systems https://www.habitat.sh and https://www.joyent.com/containerpilot
- Representation is not limited to dictionary: https://coreos.com/etcd/docs/latest/api_v3.html#logical-view
- http://ceph.com/community/monitors-and-paxos-a-chat-with-joao/
- Ensemble: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_systemReq
- Tech Note 4 · Netflix/curator Wiki: ZooKeeper makes a very bad Queue source.
–jeroen