Module 8: Message Queues & Event Streaming

🚀 Problem Statement

When a document is uploaded, the system must:

Extract text
Chunk the document
Generate embeddings
Store in vector DB
Notify reviewers
Update the search index

Performing all six steps synchronously might take 45 seconds, during which the user may see a loading spinner.

🧠 The Engineering Story

The Villain: "The Synchronous Chain." Each step waits for the previous one. If embedding generation is slow, the user waits. If the notification service is down, the entire upload fails.

The Hero: "The Event Pipeline." Upload returns immediately, emitting a DocumentUploaded event. Independent consumers handle each step at their own pace with their own retry logic.

The Plot:

Understand the difference: Message Queue (RabbitMQ — point-to-point) vs Event Stream (Kafka — pub-sub log)
Design event schemas with backward compatibility (Avro/Protobuf)
Implement idempotent consumers (same event processed twice = same result)
Handle ordering guarantees: partition-level ordering in Kafka

The Twist (Failure): The Poison Message. A malformed PDF causes the text extraction consumer to crash. The message is redelivered, crashes again, and creates an infinite loop while subsequent messages pile up.

Interview Signal: Can design a dead-letter queue strategy and explain exactly-once vs at-least-once semantics.

🧠 Queue vs Stream Decision

Factor	Message Queue (RabbitMQ)	Event Stream (Kafka)
Pattern	Task distribution	Event log (replayable)
Consumption	Message deleted after processing	Consumer tracks offset
Use case	"Process this document"	"Document was uploaded" (multiple consumers)
Ordering	Per-queue FIFO	Per-partition ordering
Pipeline Context	Background job processing	Event sourcing document revision history