Cloud cost · 12 min read · 2026-05-07

Why your AWS bill exploded after going microservices — and how to undo it.

TL;DR — Splitting a monolith into microservices adds at least seven categories of cost that the original architecture didn't have. Most teams don't cost them out beforehand. Reversing the bill back doesn't require a rewrite — it requires collapsing services that don't have independent scale needs and replacing managed components with in-process equivalents where the trade-off makes sense. One engagement: $3.5–5.5K/mo → $340–545/mo.

The situation you're in

Two years ago, a senior engineer pitched microservices. Stripping the monolith into 40 services would let teams move independently, scale specific paths horizontally, and adopt different stacks where they made sense. Leadership signed off. The migration took six months. Everyone agreed it was the right call.

Now the AWS bill is climbing 8% per quarter and nobody knows why. User count is flat. Feature shipping is slower than before the migration, not faster. Finance is asking pointed questions. Engineering is annoyed because they're being asked to defend something the whole company applauded two years ago.

This is normal. Almost every microservice migration I get called in to look at hits exactly this point around month 18–24. The fix isn't a rewrite — it's surgical de-fragmentation.

The seven hidden cost drivers

1. Per-service overhead, multiplied

Every service has fixed overhead that didn't exist in the monolith: a load balancer slice, a Kubernetes pod (or two for HA), an EKS / ECS task slot, a CloudWatch log stream, a deployment pipeline, a separate metric publisher. None of these scale with traffic — they're paid whether the service is busy or idle. Multiply by 40 services and the floor cost is significant.

Diagnostic: add up the cost of running each service at zero traffic. If that floor is more than 30% of your total bill, you've over-fragmented.

2. Inter-service network traffic

What was a function call in the monolith is now an HTTP / gRPC round trip across the cluster. NLB-cost, NAT-gateway cost, cross-AZ data transfer cost. Most teams don't realise how much money flows through their NAT gateway because the AWS bill puts it under "data transfer" — far from "EKS".

Diagnostic: AWS Cost Explorer → group by usage type → look for DataTransfer-Regional-Bytes and NatGateway-Bytes. If they're more than 5% of your bill and you didn't expect that, this is your driver.

3. Managed services bought "for the architecture", not for the load

Auth0 was added because microservices need a centralised identity provider. Elasticsearch cluster was added because each service "needs to be searchable". Redis cluster was added because services share cache. Kafka was added because events need to be reliable. None of these decisions were wrong in principle — but each one assumes a load profile that may not match yours.

I've seen $1K/mo go to a Redis cluster handling fewer reads per second than a single-process LRU could absorb.

Diagnostic: for each managed service, compute requests-per-second × peak. Compare against what an in-process equivalent could handle. If you're at 5% of capacity, the trade-off has flipped.

4. Idle dev / staging environments running 24/7

Microservices encourage "every team gets their own environment". That's good for velocity if you remember to scale them down at night. Most teams don't. A 40-service staging environment running idle from 7pm to 9am next day is a real number on your bill, every day, weekends included.

Fix today: auto-shutdown rule that scales staging EKS node groups to zero between 8pm and 8am local time. Reverses with a button if needed.

5. Log volume from chatty service-to-service traffic

The monolith logged a request once at the entry point. The 40-service version logs the same request 6–10 times as it traverses services, plus tracing spans, plus health-check noise. CloudWatch ingestion is per-GB. Add unbounded retention and you get a bill that compounds.

Fix today: log-retention rule (30 days for application logs, 7 for traces, 1 for health checks). Sample tracing at 1% in production unless investigating something specific. Move long-term logs to S3 (Glacier-tier) for compliance retention.

6. EKS / Fargate / RDS overprovisioned because "we might need it"

The original sizing was a guess at growth that didn't materialise. Each service has a node-group with min=2 (for HA), most are running min in production. Each RDS is at db.t3.large because "we'll grow into it". You're paying for headroom you don't use.

Diagnostic: AWS Compute Optimizer (free). Run for two weeks. Trust its recommendations on EC2 / RDS. Save 15–30% in one click.

7. Two databases doing the same job

I've seen platforms with PostgreSQL and MySQL and MongoDB and Elasticsearch and Redis. Each was added because one team needed one feature. Each costs RDS / managed-cluster money. None of them is being used for what it's uniquely good at — Postgres alone could have served all five workloads at this scale.

Diagnostic: for each non-Postgres datastore, ask: can Postgres do this? tsvector handles Elasticsearch's job at <1M docs. pg_trgm handles fuzzy match. JSONB handles document storage. Recursive CTEs handle the relationship traversal that pulled in Neo4j. LISTEN/NOTIFY handles light pub/sub. Redis-as-cache can be replaced with in-process LRU at this scale.

The collapse playbook

You don't fix this with a rewrite. You fix it with a de-fragmentation phase that's independently shippable.

Step 1 — Profile the actual usage

For two weeks, instrument. Capture: requests per second per service (peak and steady state); CPU + memory utilisation per pod; cross-service call frequency (which services talk to which, how often); managed-service utilisation (Redis cache hit rate, Elasticsearch QPS, Kafka throughput).

Output: a one-page report ranking services by cost-vs-load. The bottom 50% are candidates for collapse.

Step 2 — Identify collapse candidates

A service is a collapse candidate if all of these are true:

If yes-yes-yes-yes, collapse it back into the nearest neighbour.

Step 3 — Replace managed services where the trade-off has flipped

This is the contentious one. The replacements I've used in production:

Auth0           → JWT + bcrypt + pyotp / passlib
Elasticsearch   → PostgreSQL tsvector + pg_trgm
Redis cluster   → In-process TTLCache (single-process app only)
Kafka           → asyncio.Queue + APScheduler / DB outbox
Neo4j           → PostgreSQL recursive CTEs
Ghost CMS       → S3 + CloudFront with Markdown source

Each swap is a multi-day project, not a multi-month one. Each swap saves 10–25% of the bill independently.

Step 4 — Right-size what's left

After collapse, run AWS Compute Optimizer. Apply the recommendations. Switch stable workloads to Reserved Instances or Savings Plans. Set auto-scaling maximums (not just minimums) so a runaway loop can't burn cash.

Step 5 — Cost as a metric

Add cost-per-user and cost-per-request as monthly engineering metrics. Review them every quarter. The forcing function is what keeps the bill from creeping back up.

The result, from one engagement

40-service Java/Go/Node platform serving ~5K real users. Original bill: $3,500–5,500/mo. After 6 weeks of collapse + replacement: $340–545/mo. Same product, same features, same uptime SLO. Engineering team needed to run it dropped from 4–8 to 1–2.

Full case study: Inara — 40 microservices → 1 modular monolith.

When this fix doesn't apply

Microservices are the right answer when you have actual independent scale needs, multiple teams that need autonomy, security boundaries that have to be enforced at the network level, or stack heterogeneity that's load-bearing. If those are true at your shop, don't collapse — invest in the missing observability and accept the bill.

The playbook above is for teams that adopted microservices for organisational fashion rather than load profile. That's most teams I see.


Recognise this in your platform?