Case Study · Healthcare SaaS

Inara AI — collapsing 40 microservices into one right-sized modular monolith.

85–90% AWS cost reduction. Engineering team needed to run it cut by 75%. Deployment time hours → minutes. HIPAA Phase 7 plan ready.

85–90%
AWS cost reduction
$3.5–5.5K/mo → $340–545/mo
40 → 1
Services collapsed into a
single modular FastAPI monolith
75%
Smaller engineering team
needed to run the platform

The situation (why they called)

A 40-microservice platform — Java, Go, and Node — built for a "millions of users" design point but actually serving roughly 5,000 real users. AWS bill hovered between $3,500 and $5,500 per month. Deployments took half a day. The test pyramid was impossible to keep green. Multiple managed services were each adding integration tax: Auth0 for identity, Elasticsearch for search, a Redis cluster for caching, Kafka for events, Neo4j for relationship traversal, Ghost CMS for content, and both MySQL and PostgreSQL for persistence. The HIPAA story had real gaps the team knew about and dreaded auditing.

The diagnosis

Three observations changed the conversation:

  1. The 5K user base was 200× below the architecture's design point. Most of the cost was overhead per service, not load.
  2. 7 of the 40 services together carried more than 80% of the actual traffic; the rest mostly talked to each other.
  3. The HIPAA gap list was real: no field-level PHI encryption, no row-level security, audit logs scattered across services with no unified retrieval.

The decision

Decision Right-size first, comply second. Collapse the 40 services into a single FastAPI modular monolith with router-level boundaries preserved (so individual services can be re-extracted later if usage justifies). Replace heavy managed components with in-process equivalents wherever the trade-off makes sense.

The "in-process equivalents" decision was the most contentious. Here's the swap:

WasBecameWhy it works at this scale
Auth0 (identity)JWT + bcrypt + pyotp5K users; we don't need an OIDC provider for that.
Elasticsearch clusterPostgreSQL tsvector + pg_trgmBoth Postgres extensions are mature; "good enough" search at this corpus size.
Redis clusterIn-process TTLCacheSingle-process app; no need for distributed cache.
Kafkaasyncio.Queue + APSchedulerEvent volume measured in hundreds/sec, not millions/sec.
Neo4j (graph)PostgreSQL recursive CTEsCare-network graph is <100K nodes; CTEs perform fine.
Ghost CMSS3 + CloudFrontCMS pages are static; no editorial team needs Ghost's UI.

The delivery

Six weeks, AI-accelerated / spec-based ("vibe-coding") development. Old stack ran in parallel for the final week.

The outcome

Numbers AWS bill: $3.5–5.5K/mo → $340–545/mo (85–90% reduction).
Engineering team needed to run it: 4–8 → 1–2.
Deployment time: half a day → ~15 minutes.
HIPAA Phase 7 plan ready: PostgreSQL RLS, KMS field-level encryption, SAML/SCIM federation, external pen-test scheduled.
Platform now sellable as B2B: 45 feature flags, 7-role / 58-permission RBAC, per-tenant theming, white-label, Arabic L→R i18n.

What I'd do differently

The rewrite proved fast because we accepted "good enough" on test parity and ran old/new in parallel for a week as the real validation. If the team had been less trusting we'd have needed a 3-week parity test phase first — adding a quarter of the timeline. Worth knowing for the next engagement: parallel-run trust is itself a project asset, not just a technical setting.


Recognise any of this in your platform?