Lesson 1Consistency models: strong, causal, eventual, read-your-writes, monotonic readsDives into distributed consistency models like strong, causal, and eventual consistency, plus read-your-writes and monotonic reads, explaining the guarantees, possible anomalies, and how applications pick models that align with user expectations.
Strong consistency guaranteesEventual consistency and convergenceCausal consistency and orderingRead-your-writes and monotonic readsChoosing models for applicationsLesson 2Distributed consensus algorithms: Paxos, Raft, and practical implementations (etcd, Consul)Introduces Paxos and Raft consensus algorithms and their roles in leader election, log replication, and configuration changes, linking theory to practice via systems like etcd and Consul for metadata, locks, and coordination.
Consensus problem and safety goalsPaxos algorithm core ideasRaft algorithm and log replicationCluster membership and reconfigurationUsing etcd and Consul in practiceLesson 3Sharding and partitioning strategies: range, hash, and directory-basedCovers sharding and partitioning strategies including range, hash, and directory-based approaches, focusing on data distribution, avoiding hotspots, rebalancing, and routing, plus how to select and evolve strategies as workloads and data scale up.
Range-based partitioning designHash-based sharding and hashingDirectory and lookup-based routingRebalancing and resharding methodsAvoiding hotspots and skewed keysLesson 4Replication models: leader-follower, multi-leader, and leaderless patternsExplores leader-follower, multi-leader, and leaderless replication, detailing write and read paths, failure handling, lag, and conflict resolution, and their impact on latency, throughput, durability, and ops complexity in global setups.
Leader-follower replication flowsMulti-leader replication and conflictsLeaderless quorum-based replicationReplication lag and read consistencyOperational trade-offs of each modelLesson 5CAP theorem and trade-offs between consistency, availability, and partition toleranceLooks at the CAP theorem and its implications for distributed databases, clarifying interactions between consistency, availability, and partition tolerance, and how real systems handle trade-offs with practical patterns and service goals.
Formal statement of the CAP theoremConsistency vs availability in practicePartition tolerance in real networksDesigning around CAP with SLAsLesson 6Network partitions, latency, and failure modes across WAN linksAnalyses network partitions, latency, and failures over WAN links, covering timeouts, partial failures, split-brain scenarios, and designing detection, retries, and degradation strategies to keep systems stable under pressure.
Characteristics of WAN linksDetecting partitions and timeoutsHandling partial and asymmetric failuresSplit-brain risks and mitigationGraceful degradation strategiesLesson 7Idempotency, retries, and at-least-once vs exactly-once semanticsExplains idempotency for safe retries, differentiating at-least-once, at-most-once, and exactly-once semantics, with patterns for deduplication, request tracking, and message processing in shaky distributed environments.
Defining idempotent operationsDesigning safe retry mechanismsAt-least-once vs at-most-onceExactly-once semantics limitationsDeduplication and request trackingLesson 8Concurrency control: optimistic vs pessimistic, MVCC, conflict resolution techniquesExamines concurrency control in distributed databases, comparing optimistic and pessimistic methods, explaining MVCC workings, and conflict detection/resolution techniques that maintain correctness with high concurrency.
Pessimistic locking in distributed systemsOptimistic control and validationMVCC snapshots and version chainsConflict detection and resolutionDeadlocks, timeouts, and retriesLesson 9Physical topology patterns: single region, active-passive, active-active, and hybridDescribes deployment topologies for distributed databases like single region, active-passive, active-active, and hybrid, analysing effects on latency, failover, data consistency, and operational complexity.
Single-region deployment trade-offsActive-passive failover patternsActive-active multi-region setupsHybrid and tiered topology designsLatency, RPO, and RTO considerations