Distributed Systems Theorems & Patterns
Before designing large-scale systems, architects must understand the foundational theorems and mathematical constraints that govern distributed computing. These principles dictate the unavoidable trade-offs between latency, consistency, and availability.
1. The CAP Theorem
In a distributed system, network partitions are inevitable (e.g., hardware failures, packet loss). The CAP theorem states that a distributed data store can only provide two of the following three guarantees simultaneously:
- Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.
Because networks aren't reliable, distributed systems must support partition tolerance. You will need to make a software tradeoff between consistency and availability.
| System Type | Description | Examples |
|---|---|---|
| CP | Chooses consistency over availability. Waits for a response from the partitioned node, which might result in a timeout error. | MongoDB, HBase, Redis, Zookeeper |
| AP | Chooses availability over consistency. Responses return the most readily available version of the data available on any node, which might not be the latest. | Cassandra, DynamoDB, CouchDB |
| CA | Technically impossible in WAN distributed systems. Exists only in local, single-node contexts. | RDBMS (MySQL, Postgres) |
2. Consistency Patterns
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data:
- Weak Consistency: After a write, reads may or may not see it. A best effort approach is taken. This approach is seen in systems such as memcached and works well in real time use cases such as VoIP, video chat, and realtime multiplayer games.
- Eventual Consistency: After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously. This approach is seen in systems such as DNS and email, and works well in highly available systems.
- Strong Consistency: After a write, reads will see it. Data is replicated synchronously. This approach is seen in file systems and RDBMSes, and works well in systems that need transactions.
3. Availability Patterns
There are two complementary patterns to support high availability: fail-over and replication.
Fail-over Architectures
- Active-Passive: With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
- Active-Active: In active-active, both servers are managing traffic, spreading the load between them. If the servers are public-facing, the DNS would need to know about the public IPs of both servers.
Availability in Numbers
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available.
| Metric | 99.9% Availability ("Three 9s") | 99.99% Availability ("Four 9s") |
|---|---|---|
| Downtime per year | 8h 45min 57s | 52min 35.7s |
| Downtime per month | 43m 49.7s | 4m 23s |
| Downtime per week | 10m 4.8s | 1m 5s |
| Downtime per day | 1m 26.4s | 8.6s |
Note: The above downtime figures are standard industry metrics.
Calculating Overall Availability:
* In Sequence: Overall availability decreases when two components with availability < 100% are in sequence. The formula is: Availability (Total) = Availability (Foo) * Availability (Bar).
* In Parallel: Overall availability increases when two components with availability < 100% are in parallel. The formula is: Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar)).
4. The PACELC Theorem
The PACELC theorem extends CAP by addressing system behavior during normal operations (when no partition exists).
Definition:
If there is a Partition (P), a distributed system must trade off between Availability (A) and Consistency (C);
Else (E), when the system is running normally, it must trade off between Latency (L) and Consistency (C).
Trade-off Matrix
| Model | Explanation | Examples |
|---|---|---|
| PA/EL | In a partition choose Availability, otherwise choose Latency. Replication is usually asynchronous. | Dynamo, Cassandra |
| PC/EC | In a partition choose Consistency, otherwise still prefer Consistency. Replication is synchronous. | BigTable, HBase |
| PA/EC | In a partition choose Availability, otherwise prefer Consistency. | MongoDB |
5. Capacity Estimation & Back-of-the-Envelope Math
Designing scalable systems requires estimating resource requirements using rough calculations.
Core Constants & Time Estimates
- Seconds in a Day: 86,400
- Seconds in a Year: ~31.5 Million
- Seconds in 50 Years: ~1.6 Billion (important for generating unique Epoch-based IDs)
- 80/20 Rule: 20% of data (hot data) typically accounts for 80% of the traffic
Standard Storage Metrics
- 1 Char = 1 Byte (ASCII) / 2 Bytes (Unicode)
- 1 Integer = 4 Bytes
- 1 UNIX Timestamp = 4 Bytes (or 8 Bytes for 64-bit)
- 1 UUID = 16 Bytes
Jeff Dean’s Latency Numbers
Understanding the orders of magnitude difference between operations is critical for identifying bottlenecks.
| Operation | Latency | Note |
|---|---|---|
| L1 Cache Reference | 0.5 ns | Fastest |
| Mutex Lock/Unlock | 100 ns | |
| Main Memory Reference | 100 ns | |
| Read 1 MB sequentially from memory | 250 µs | |
| Round trip within same datacenter | 500 µs | |
| Disk seek | 10 ms | Slow |
| Read 1 MB sequentially from disk | 30 ms | |
| Send packet CA → Netherlands → CA | 150 ms | Speed of light limit |