Change failure rate, explained
Change failure rate (CFR) is one of the four DORA metrics. It measures the percentage of deployments that result in a failure requiring remediation: a rollback, hotfix, or patch. A lower CFR means deployments are safer; a higher CFR means the team is shipping unstable changes or has insufficient pre-deploy validation.
How to calculate CFR
The formula is straightforward:
(Number of deployments causing a production failure) / (Total number of deployments) x 100
Measure over a rolling 30-day window to smooth noise from one-off incidents or unusually high-volume deploy periods. A single bad deploy week should not define your team's performance, but a sustained pattern above the DORA thresholds is a signal worth acting on.
Count a deployment as a failure if it triggers any remediation action: a full rollback, a hotfix pushed the same day, or an emergency patch within a defined window (typically 24-48 hours). Teams that rely on informal fixes without tracking them tend to undercount CFR significantly.
DORA benchmarks
DORA publishes four performance tiers for CFR:
| Tier | CFR Range |
|---|---|
| Elite | 0-15% |
| High | 16-30% |
| Medium | 31-45% |
| Low | Above 45% |
Most engineering teams aspire to the elite tier, but the practical target depends on deployment frequency, service criticality, and rollback cost. A team deploying ten times per day on a payment service has a very different risk calculus than one deploying weekly on an internal tool.
The DORA thresholds are a reference point, not a ceiling. Teams at the high tier can still have significant incident volumes if their deployment frequency is high.
What drives a high CFR
Several root causes account for most CFR increases:
Insufficient test coverage. Unit and integration tests that do not reflect production load patterns or edge-case dependency behavior leave gaps that only surface under real traffic.
No blast-radius analysis before full rollout. Shipping a change to 100% of traffic without understanding which downstream services, queues, or datastores it touches means a single bad deploy can cascade before anyone has time to intervene.
No canary deployment strategy. Canary and progressive delivery patterns limit exposure by routing a small percentage of traffic to the new version first. Without them, every deploy is an all-or-nothing bet.
Dependency changes not surfaced before deploy. A library version bump, a transitive dependency update, or an API contract change in a shared service can cause failures that look unrelated to the deploy in question. If dependency drift is not tracked continuously, these surprises compound CFR.
AI-generated code not validated against the production dependency graph. As teams adopt AI coding assistants, the volume of code changes increases faster than manual review capacity. Changes generated without awareness of how production services are wired together introduce a new category of hard-to-predict failures.
Reducing CFR with pre-deploy analysis
The most direct lever for lowering CFR is catching risky changes before they reach production. Three techniques have the highest leverage:
Pre-deploy blast radius analysis maps which services, jobs, and data flows a proposed change touches. A change to a shared authentication library might affect twenty downstream services. Knowing that before deploy determines whether a canary rollout or a coordinated deployment window is warranted.
Schema migration checks catch database changes that are backward-incompatible with the current deployed version, preventing the class of failures where a new schema breaks the old application code during a rolling deploy.
Dependency drift detection compares the dependency versions in a candidate deploy against what is running in production, flagging divergence that might indicate a breaking change introduced indirectly.
Each of these reduces the percentage of changes that fail in production by making the risk visible before the change ships. For teams measuring causal change analysis accuracy, see the AI SRE Benchmark, which covers how NOFire AI approaches root-cause attribution in production environments.
CFR vs MTTR
CFR and mean time to recovery (MTTR) measure different dimensions of deployment health. CFR measures how often you fail. MTTR measures how fast you recover when you do.
Both matter. A team with low CFR and high MTTR is shipping stable changes but struggling to respond when incidents do occur. A team with high CFR and low MTTR is failing often but recovering quickly. The DORA model treats both as independent performance indicators, and elite teams optimize for both.
The relationship between them matters for prioritization. If CFR is high and MTTR is also high, start with CFR: fewer failures means fewer recoveries to manage. If CFR is within the elite range but MTTR is high, the investment should go into detection and response tooling rather than pre-deploy validation.
For a deeper look at how causal analysis accuracy affects both metrics, see the AI SRE Benchmark.
Frequently asked questions
- What is an acceptable change failure rate?
- DORA elite is 0-15%. If your CFR is above 30%, pre-deploy validation and deployment strategy are the first places to look.
- Does CFR apply to configuration changes?
- Yes. Any change that can cause a production failure counts: code deploys, configuration updates, infrastructure changes, dependency version bumps.
- How is CFR related to deployment frequency?
- DORA research shows elite teams have high deployment frequency AND low CFR. Frequent small deploys are easier to validate and roll back than large infrequent ones.
Go deeper: the AI SRE Benchmark
Book a demo