Module 8: Message Queues & Event Streaming
🚀 Problem Statement
When a document is uploaded, the system must:
- Extract text
- Chunk the document
- Generate embeddings
- Store in vector DB
- Notify reviewers
- Update the search index
Performing all six steps synchronously might take 45 seconds, during which the user may see a loading spinner.
🧠The Engineering Story
The Villain: "The Synchronous Chain." Each step waits for the previous one. If embedding generation is slow, the user waits. If the notification service is down, the entire upload fails.
The Hero: "The Event Pipeline." Upload returns immediately, emitting a DocumentUploaded event. Independent consumers handle each step at their own pace with their own retry logic.
The Plot:
- Understand the difference: Message Queue (RabbitMQ — point-to-point) vs Event Stream (Kafka — pub-sub log)
- Design event schemas with backward compatibility (Avro/Protobuf)
- Implement idempotent consumers (same event processed twice = same result)
- Handle ordering guarantees: partition-level ordering in Kafka
The Twist (Failure): The Poison Message. A malformed PDF causes the text extraction consumer to crash. The message is redelivered, crashes again, and creates an infinite loop while subsequent messages pile up.
Interview Signal: Can design a dead-letter queue strategy and explain exactly-once vs at-least-once semantics.
🧠Queue vs Stream Decision
| Factor | Message Queue (RabbitMQ) | Event Stream (Kafka) |
|---|---|---|
| Pattern | Task distribution | Event log (replayable) |
| Consumption | Message deleted after processing | Consumer tracks offset |
| Use case | "Process this document" | "Document was uploaded" (multiple consumers) |
| Ordering | Per-queue FIFO | Per-partition ordering |
| Pipeline Context | Background job processing | Event sourcing document revision history |