Skip to main content

SLA Uptime & Allowed Downtime Calculator (Error Budget)

Convert downtime into uptime percentages, calculate allowed downtime for various SLA targets, and compute error budgets. Understand your service availability and SLA compliance at a glance.

Last updated:
Reviewed by Waqar Kaleem Khan, Founder & Lead AI Engineer
Loading calculator...

Your vendor contract says 99.9% SLA uptime. Sounds nearly perfect. Then you convert it: 43 minutes of allowed downtime per month. A single bad deploy on a Friday afternoon can blow the budget before anyone pages on-call. Most teams sign SLAs without doing the per-month math, then act surprised when a 40-minute outage triggers service credits.

Enter your actual downtime and measurement period. The output shows uptime percentage, comparison against common SLA tiers, and how much error budget remains.

The Gap Between Uptime Percentage and Real Availability

99.9% and 99.99% look almost identical on paper. In practice the gap is 39 minutes per month versus 4 minutes. Four minutes barely covers detection and escalation. Forget about a rollback. Any team promising four nines needs automated failover, not a human paging tree.

The percentage also hides what went down. A login endpoint returning 503 for two minutes hits every user. An internal batch job failing for an hour hits nobody externally. The Google SRE Book frames this as “not all minutes are equal,” which is worth reading before you set a target.

Error Budgets: Spending Downtime Like Currency

An error budget flips the conversation. Instead of “avoid all downtime” you ask “how much can we spend this month and still hit the target?” A 99.95% SLA over 30 days gives about 21 minutes. Every deploy, config push, and maintenance window draws from that pool.

Budget low? Fewer risky deploys, more bake time on canary releases. Budget flush? Ship faster — you can absorb a brief regression. That loop is the core of Google-style SRE practice, and it only works if you actually track the number.

Per-Month Windows vs Annual Averages: Why the Math Diverges

A 99.9% annual SLA allows roughly 8.7 hours across 365 days. You could burn all of it in January and still pass, but January users wouldn't care about your annual average. Most SaaS contracts measure monthly because billing and credits land monthly. A 99.9% monthly target allows only about 43 minutes, with no carryover.

Some providers measure a rolling 720-hour window instead of the calendar month, so a late-month outage can count against two billing cycles. Before negotiating, confirm whether the window is calendar, rolling, or billing-cycle aligned.

At-a-Glance Output: What Each Nines Tier Actually Buys You

Allowed downtime by SLA tier per 30-day month
SLA TierDowntime / MonthRealistic For
99% (two nines)~7.2 hoursInternal tools, staging
99.9% (three nines)~43 minutesMost SaaS products
99.99% (four nines)~4.3 minutesPayment, auth services
99.999% (five nines)~26 secondsTelecom, 911 systems

If your result lands between tiers, you're either over-promising (four nines without the redundancy) or under-selling (running three nines while advertising two).

Troubleshooting Notes for SLA Negotiations

  • Planned maintenance exclusions. Confirm maximum hours per month and whether the provider can declare maintenance retroactively.
  • Partial degradation. A latency spike from 200ms to 3 seconds isn't an “outage” by most definitions, but it is one for your users. Push for latency thresholds alongside up/down metrics.
  • Compound SLA. Service A at 99.9% depending on Service B at 99.9% yields 99.8% combined, not 99.9%. Every dependency multiplies the risk.

Mistakes that blow SLA reviews: comparing monthly targets against annual totals, forgetting rolling windows can double-count one incident, and assuming the provider’s monitoring agrees with yours on what counts as “down.”

Related on EverydayBudd's developer utilities hub: the API Rate Limit Planner for the production-reliability decisions that interact with availability targets, and the File Transfer Time Calculator for the network-side reliability math.

Uptime percentages and error budgets from this tool are planning estimates. They don't replace contractual SLA definitions, provider-side monitoring data, or legal review of service-credit terms.

Frequently Asked Questions

How is uptime percentage actually calculated when downtime spans multiple incidents?

Uptime % = (Total Period - Sum of Downtime) / Total Period × 100. Multiple incidents add together. If you had a 12-minute outage on the 3rd and a 31-minute outage on the 17th, the calculation treats them as 43 minutes of total downtime against a 30-day (43,200-minute) window: (43,200 - 43) / 43,200 × 100 = 99.9005%. SLA contracts almost always sum incidents this way. The question of whether each incident has to exceed a minimum duration to count is contract-specific.

What does each tier of nines actually buy you in monthly downtime?

Two nines (99%) gives you 7.2 hours per month. Three nines (99.9%) gives 43 minutes per month. Four nines (99.99%) gives 4.3 minutes per month. Five nines (99.999%) gives 26 seconds per month. Each nine cuts allowed downtime by 10x. The engineering cost between adjacent tiers is more than 10x. Going from three to four nines typically requires multi-AZ redundancy, automated failover, and chaos engineering practice. Going from four to five usually requires multi-region active-active and is rarely worth the operational complexity outside of payments and ad-tech.

What's an error budget and how does it change the on-call conversation?

An error budget reframes availability from "avoid all downtime" to "you have X minutes of downtime to spend this month, how do you want to spend them?" If your SLA is 99.9% over 30 days, the budget is 43 minutes. Spend 20 minutes on a planned migration and you have 23 left. The conversation shifts from "we had an outage" to "we burned 12 minutes of budget on the bad deploy." That framing makes deploy velocity a deliberate tradeoff rather than a guilty secret. Google's SRE book chapter on Service Level Objectives is the canonical reference.

Does planned maintenance count against my SLA?

Contract-dependent. AWS and most major cloud providers exclude scheduled maintenance from SLA calculations if proper advance notice is given, typically 24 to 72 hours. Smaller vendors and internal SLAs sometimes include all downtime regardless of cause, which makes the metric harder to hit but more reflective of customer experience. Read the contract for "exclusions," "force majeure," and "scheduled maintenance" definitions. If you're writing the SLA, decide upfront. "Downtime is downtime to the user" is the more honest framing, but it changes how aggressive your maintenance schedule can be.

How do service credits typically work, and are they actually worth chasing?

Most providers tier credits by how badly they missed the target. A common pattern: 10% credit if uptime fell below 99.99% but stayed above 99.0%, 30% if it fell below 99.0%, 100% if it fell below 95%. Numbers vary by vendor. Credits are usually capped at one month's bill and require you to file a claim within a window (often 30 days). Whether they're worth chasing depends on the dollar value. $50 of AWS credit isn't worth a half-hour of paperwork. $50,000 of enterprise platform credit obviously is. Track your provider's uptime independently. Don't trust their dashboard.

Why does the measurement period change my allowed downtime so dramatically?

Because percentages compound across time differently than humans intuit. 99.9% over a month allows 43 minutes. 99.9% over a year allows 8.76 hours. You could absorb 8 hours of downtime in January and still be at 99.9% for the year. Monthly windows are stricter on incidents, annual windows are stricter on cumulative behavior. Most contracts use monthly windows because they reset the budget faster, which favors the customer when a single bad month would otherwise dominate the annual number.

My error budget went negative. What does that practically mean?

You've burned more downtime than the SLA allowed and the customer relationship is now in remediation mode. Practically: you're probably owed service credits per the contract. Operationally: the next on-call shift should be conservative, deploys should slow down, and the postmortem culture should kick in. Strategically: a single negative-budget month is recoverable. Three in a quarter usually triggers a contract renegotiation. The SRE practice is to treat budget burn as a leading indicator. When you've burned 70% of the monthly budget by week three, freeze risky deploys.

Is this calculator usable for SLO planning, or is it just for SLAs?

Both. The math is identical. The difference is contractual. SLAs are external commitments to customers with credits attached. SLOs are internal targets that drive engineering decisions. SLOs should typically be set tighter than the SLA. If you commit 99.9% to customers, you might target 99.95% internally so a bad month doesn't immediately breach the contract. Use the calculator to model both and see how much headroom your SLO buys you against your SLA.

What's the difference between uptime and availability?

They're often used interchangeably but they're not the same. Uptime is binary, the service is either up or down, and ignores partial degradation. Availability is broader and can include latency thresholds (the service is "available" only if 99% of requests complete in under 500ms), feature completeness (the search endpoint is up but image upload is broken), and geographic considerations (the US region is healthy but EU is degraded). For consumer SaaS, uptime is usually fine. For B2B platforms with latency-sensitive integrations, availability with quality-of-service metrics tells the truer story.

How do I calculate allowed downtime for an unusual SLA target like 99.95%?

Allowed Downtime = Total Period × (1 - SLA Target / 100). For 99.95% over a 30-day month (43,200 minutes): 43,200 × 0.0005 = 21.6 minutes. The tool calculates this for any target. The trickier question is what unusual targets are appropriate. 99.95% (about 21 minutes/month) is a common compromise between three-nines (43 minutes) and four-nines (4 minutes) when the service is too important for the looser tier but multi-region active-active isn't justified.

How do I track error budget burn across a long measurement period?

Sample downtime daily, sum it weekly, project against the period budget. If you're at week 2 of a 4-week measurement period and you've burned 60% of budget, project linearly: 60% × (4 / 2) = 120%. You're on pace to breach. The leading-indicator framing matters more than the trailing math. Budget that's burning faster than expected is a deploy-freeze trigger, not just a metric to track. Tools like Honeycomb, Datadog, and Grafana have built-in SLO panels that automate this.

Is 99.99% uptime realistic for a single-region service without redundancy?

No. Four nines allows about 52 minutes of downtime per year. A single OS patch reboot eats most of that. Reaching four nines typically requires multi-AZ or multi-region active load balancing, automated failover, and routine chaos-engineering practice. Single-region production services without HA generally land between 99.5% and 99.9% in practice. If your contract demands 99.99% on a single VM, the contract is wrong, not the engineering. Push back during contract negotiation, or quietly architect for what the contract actually demands.

Explore More Tech & Dev Utilities

Calculate file transfer times, subnet configurations, password entropy, and more with our suite of developer tools.

How helpful was this calculator?