after we build · part seven · the sla
The SLA — the operational contract
An SLO is an internal promise the team makes to itself. An SLA is an external promise the team makes to the client — contractual, measurable, with consequences for missing it.
The SLA is where every practice defined in the previous four volumes converges into a single enforceable commitment: this is what we promise the system will do, and this is what happens if it doesn't.
How SLAs derive from SLOs
The SLO from Volume IV says: 99% of submissions in under 2 seconds. The SLA promises the client 95%. The gap — the 4% margin — is the team's operational safety buffer.
- If the SLO is breached, the team has time to fix before the SLA is breached.
- If the SLA is breached, the client relationship absorbs the cost.
A team with no margin between SLO and SLA is a team where every operational hiccup becomes a contractual breach.
SLA categories
Availability — the percentage of time the system is operational. "99.5% uptime per month" means no more than ~3.6 hours of downtime. Measured by monitoring, not by user reports.
Response time — how quickly the system responds to a request. Defined per critical flow: submission under 2 seconds, page load under 1 second, report generation under 10 seconds.
Support response time — how quickly L1 acknowledges a ticket, how quickly L2 investigates, how quickly L3 resolves. Defined per priority level.
Resolution time — how quickly a reported issue is resolved.
- P0: 4 hours.
- P1: 1 business day.
- P2: 5 business days.
These are targets, not aspirations — they are contractual.
Data integrity — the commitment that data entered by the user is stored correctly and not lost. The hardest SLA to recover from when breached — data loss erodes trust faster than any other failure.
SLA monitoring and breach protocol
SLAs are monitored by the same dashboards that monitor SLOs — but with an additional layer: when the SLA threshold is approached (not crossed), an alert fires. This is the early warning. The team has time to act before the breach.
If the SLA is breached, the protocol mirrors the incident process: contain, communicate, resolve. The difference: the client is the first person notified, not the last.
A team that reports its own SLA breach before the client notices builds trust. A team that hopes the client didn't notice erodes it irreversibly.