1. Scalability bottlenecks
Symptoms
- Response times spike under predictable load (Monday 9am, payroll day, sale launch)
- One slow query takes the database — and the rest of the app — down
- Adding more EC2 instances doesn't help, or costs more than it saves
- Background jobs and user requests share the same connection pool
Root causes
- The system was designed for a user base it never reached, OR for a user base it long outgrew
- 3–5 hot endpoints carry 80%+ of load, but everything is scaled together
- Caching, queueing, read-replicas added late and inconsistently
Short-term fix (this week)
- Identify the top 5 endpoints by latency × volume; optimise queries on those only
- Add Redis (or in-process cache) for the 3 most-read entities
- Move expensive reports to background queue with cached results
- Add CloudFront in front of any GET-heavy public API
Long-term solution
- Profile real usage; right-size to actual users, not hypothetical 10×
- Identify which 3–5 services need horizontal scale; keep the rest as a tight monolith
- Read replica + connection-pool segregation (jobs vs requests)
- Auto-scaling with explicit ceilings to prevent runaway costs
→ See: Inara — 40 microservices collapsed into 1 modular monolith