Module 7: Replication, Consensus & CAP Theorem

🚀 Problem Statement

An Enterprise CMS requires that when a project lead approves a data record, that approval must never be lost (durability) and must be immediately visible to all users (consistency). However, the system must also be available 24/7 for end users on potentially flaky mobile networks.

🧠 The Engineering Story

The Villain: "The Split Brain." A network partition isolates a primary database from its replica. If both accept writes, the partition healing might reveal two conflicting versions of the same Core Engine approval.

The Hero: "The Consensus Protocol." Using Raft/Paxos, the system elects a single leader, and writes only succeed if a majority of replicas acknowledge — guaranteeing no divergence.

The Plot:

Deep dive into CAP theorem — understand it as a spectrum, not a binary choice
Study leader-based replication (Postgres streaming) vs leaderless (DynamoDB/Cassandra)
Understand Raft consensus: leader election, log replication, safety
Mapping system consistency requirements per data type

The Twist (Failure): Latent Stale Reads. Using async replication for performance may result in a read replica serving an old version of an enterprise document for several seconds after an update. An end user might then download an outdated standard procedure.

Interview Signal: Can map business requirements to consistency levels (strong, eventual, causal).

🧠 Consistency Spectrum

Level	Guarantee	Cost	GenAI Use Case
Strong (Linearizable)	Reads always see latest write	High latency	system approvals, sign-offs
Causal	Respects cause-effect ordering	Medium	Comment threads on documents
Eventual	Will converge, no timing guarantee	Low latency	Analytics dashboards, usage stats
Read-Own-Writes	User sees own writes immediately	Medium	Document edits by same user