Inara AI — collapsing 40 microservices into one right-sized modular monolith.
85–90% AWS cost reduction. Engineering team needed to run it cut by 75%. Deployment time hours → minutes. HIPAA Phase 7 plan ready.
$3.5–5.5K/mo → $340–545/mo
single modular FastAPI monolith
needed to run the platform
The situation (why they called)
A 40-microservice platform — Java, Go, and Node — built for a "millions of users" design point but actually serving roughly 5,000 real users. AWS bill hovered between $3,500 and $5,500 per month. Deployments took half a day. The test pyramid was impossible to keep green. Multiple managed services were each adding integration tax: Auth0 for identity, Elasticsearch for search, a Redis cluster for caching, Kafka for events, Neo4j for relationship traversal, Ghost CMS for content, and both MySQL and PostgreSQL for persistence. The HIPAA story had real gaps the team knew about and dreaded auditing.
The diagnosis
Three observations changed the conversation:
- The 5K user base was 200× below the architecture's design point. Most of the cost was overhead per service, not load.
- 7 of the 40 services together carried more than 80% of the actual traffic; the rest mostly talked to each other.
- The HIPAA gap list was real: no field-level PHI encryption, no row-level security, audit logs scattered across services with no unified retrieval.
The decision
The "in-process equivalents" decision was the most contentious. Here's the swap:
| Was | Became | Why it works at this scale |
|---|---|---|
| Auth0 (identity) | JWT + bcrypt + pyotp | 5K users; we don't need an OIDC provider for that. |
| Elasticsearch cluster | PostgreSQL tsvector + pg_trgm | Both Postgres extensions are mature; "good enough" search at this corpus size. |
| Redis cluster | In-process TTLCache | Single-process app; no need for distributed cache. |
| Kafka | asyncio.Queue + APScheduler | Event volume measured in hundreds/sec, not millions/sec. |
| Neo4j (graph) | PostgreSQL recursive CTEs | Care-network graph is <100K nodes; CTEs perform fine. |
| Ghost CMS | S3 + CloudFront | CMS pages are static; no editorial team needs Ghost's UI. |
The delivery
Six weeks, AI-accelerated / spec-based ("vibe-coding") development. Old stack ran in parallel for the final week.
- Week 1–2: discovery, modular monolith design, router map. Every old endpoint mapped to a new module + path.
- Week 3–4: domain rewrite. In-process replacements for managed services landed. Test coverage on the critical paths.
- Week 5: data migration scripts. RLS scaffolding for HIPAA Phase 7. Audit log unified into a single append-only table.
- Week 6: production cutover behind a feature flag. One week of parallel run. Old stack decommissioned.
The outcome
Engineering team needed to run it: 4–8 → 1–2.
Deployment time: half a day → ~15 minutes.
HIPAA Phase 7 plan ready: PostgreSQL RLS, KMS field-level encryption, SAML/SCIM federation, external pen-test scheduled.
Platform now sellable as B2B: 45 feature flags, 7-role / 58-permission RBAC, per-tenant theming, white-label, Arabic L→R i18n.
What I'd do differently
The rewrite proved fast because we accepted "good enough" on test parity and ran old/new in parallel for a week as the real validation. If the team had been less trusting we'd have needed a 3-week parity test phase first — adding a quarter of the timeline. Worth knowing for the next engagement: parallel-run trust is itself a project asset, not just a technical setting.
Recognise any of this in your platform?