Circuit Breaker · Chaitanya.Dev

When to reach for it

You make synchronous calls to a dependency whose slow failure (timeout, 5xx storms, retry amplification) can take your service down with it
A downstream is occasionally overwhelmed and the kindest thing is to stop calling for a minute so it can recover
You have a meaningful fallback — cached data, a degraded response, "service temporarily unavailable" — that's better than blocking threads on a dying dependency
Your service is part of a fan-out where one slow dependency stalls the whole response
You've already been on an incident where a downstream got slow and your thread pool filled with retries

What it actually costs

Tuning is a perpetual chore — failure threshold, window, half-open trial count, cooldown. Too sensitive and it flaps under normal spikes; too lax and it's window dressing. You need a fallback that's actually useful and tested, or the breaker just swaps one error for another. Per-instance state means a fleet of 50 pods each learn about the failure independently. Centralised state (Redis-backed) introduces a new failure surface.

The failure mode nobody mentions

False sense of safety. The breaker is configured, the dashboard has a 'circuit state' panel, the team feels good — and then the downstream returns 200 OK with garbage JSON, or 200 OK after 28 seconds, and the breaker never trips because nothing 'failed'. Breakers protect against the failures you defined them around. Make sure latency-as-failure and contract-violation-as-failure both trip it, not just exceptions.

When not to use it

An async call you can retry off the critical path, or a single-tenant tool with no failure cascades — the breaker adds machinery for a problem that doesn't exist there.

← All patterns