Module 9: Microservices, Service Mesh & Communication
🚀 Problem Statement
A GenAI monolith might handle document management, Core Engine logic, embedding generation, LLM orchestration, user auth, notifications, and analytics. Deploying a fix to the document revision logic in such a system requires redeploying everything, including GPU-intensive services.
🧠The Engineering Story
The Villain: "The Monolith That Knew Too Much." Every component shares the same database, the same deployment pipeline, and the same Python process. A memory leak in the analytics module crashes the LLM inference.
The Hero: "The Bounded Context." Each business domain becomes an independent service with its own data store, deployment lifecycle, and scaling characteristics.
The Plot:
- Decompose by business capability (DDD bounded contexts)
- Choose communication: sync (gRPC) vs async (events) per boundary
- Implement service discovery and circuit breakers
- Deploy a service mesh (Istio/Linkerd) for observability and traffic control
The Twist (Failure): Distributed Monolith. Splitting into 15 services that all share the same database and must be deployed together results in the operational complexity of microservices with none of the benefits.
Interview Signal: Can identify when NOT to use microservices — and explain the "modular monolith" alternative.
🧠Service Decomposition for the technical stack
| Service | Responsibility | Communication | Scaling Profile |
|---|---|---|---|
| Document Service | CRUD, versions, revisions | REST/gRPC (sync) | CPU-light, I/O bound |
| Core Engine | data record logic | gRPC (sync) | CPU-medium |
| Embedding Service | Text → vector embedding | gRPC (sync, batch) | GPU-heavy, bursty |
| LLM Orchestrator | Prompt → response | Async queue + SSE | GPU-heavy, long-running |
| Comment Service | Comments, threads | REST (sync) | CPU-light |
| Search Service | Hybrid vector + keyword | gRPC (sync) | Memory-heavy |
| Notification Service | Email, Teams, push | Async events | I/O bound |