🏗️ HLD Concept: Real-time Updates
📝 Definition
The Real-time Updates pattern addresses the architectural challenge of pushing data from a server to a client immediately as events happen, rather than waiting for the client to explicitly request the information. This pattern encompasses both the client-server protocol used to maintain a persistent connection and the server-side infrastructure (like Pub/Sub systems or consistent hashing) required to route messages to the specific server holding the user's active connection.
🚀 Why it matters
For standard synchronous APIs, communication is straightforward: a client sends a request, and the server returns a response once the task is completed. However, modern applications like chat platforms (WhatsApp), live leaderboards, collaborative editors, and notifications require low-latency, real-time data delivery. Implementing this pattern efficiently ensures that millions of concurrent users can receive instantaneous updates without overwhelming the backend infrastructure with constant, resource-intensive requests.
⚖️ Trade-offs & Decisions
| Server-Sent Events (SSE) | WebSockets | When to use what? |
|---|---|---|
| Unidirectional push from server to client over a single HTTP connection. It streams multiple messages as "chunks" in a single HTTP response. | Bi-directional, persistent TCP connection established via an HTTP upgrade. Allows both server and client to push binary or text data at any time. | Use SSE for simple push notifications or live feeds where data only flows one way. Use WebSockets for high-frequency, persistent, bi-directional communication, such as chat applications or multiplayer games. Default to simple HTTP Polling if a slight delay (e.g., 5 seconds) is acceptable, as it avoids complex connection management. |
🛠️ Implementation Strategies
- Strategy 1: Decoupling via Pub/Sub: When managing millions of connections, users will be distributed across hundreds of different servers. To route a message to the correct recipient, you must decouple the publisher from the subscriber using a Pub/Sub service like Redis or Kafka. When a message is sent, the server writes it to the database and publishes it to a Pub/Sub topic (partitioned by user or chat ID). The specific server holding the recipient's active WebSocket connection subscribes to this topic and pushes the payload directly to the client.
- Strategy 2: Ensuring Reliability and Fallbacks: Persistent connections frequently drop due to poor network conditions, and TCP keepalives can take minutes to detect a severed connection. Furthermore, Pub/Sub systems like Redis offer "at most once" delivery, meaning messages can be lost during transient failures. To guarantee deliverability, always persist the message to a database or "Inbox" table before publishing to the real-time channel. Clients should acknowledge receipt of the message. If the connection fails, combine application-level heartbeats (to detect dead sockets) with periodic background polling as a final backstop to fetch any missed messages.
- Strategy 3: Stateful Servers & Consistent Hashing: For applications requiring heavy real-time processing (like collaborative text editing), you may route clients to stateful servers using a consistent hash ring. This ensures all users collaborating on the same entity (e.g., a specific document) maintain connections to the exact same server, keeping synchronization logic localized and efficient.
🧠 Interview Talk-Track
- Key Insight: Start simple. Determine if true real-time delivery is actually a strict requirement. For example, a live competition leaderboard can often be satisfied with a client polling the server every 5 seconds. This avoids the massive infrastructure overhead of maintaining hundreds of thousands of stateful WebSocket connections.
- Common Pitfall: Proposing WebSockets prematurely without justifying the need for high-frequency, bi-directional communication. WebSockets require specialized infrastructure (like Layer 4 load balancers) and introduce significant complexity around connection state management, firewalls, and proxy compatibility.
Core Takeaway
The Real-time Updates pattern is essential for low-latency push communication but introduces significant state management complexity. Always evaluate if simple polling suffices; if not, choose between SSE for unidirectional pushes and WebSockets for bi-directional flows, scaling the backend using Pub/Sub mechanisms to route messages to the correct active connections.