Skip to content

The Release

Runbooks rehearsed before the incident. The release gate — every item checked, no exceptions. Rollback discipline at four levels. The release moment. Client and CS communication before the feature is live.

Events in this phase. Runbook rehearsal — scheduled in staging before each release. Release gate review — short meeting, checklist walked through, go/no-go decided. Both at the slice boundary, not during regular flow.

Runbooks — written before the incident

This is where meaning meets operational reality. A runbook is written before any feature that touches a critical path goes live. Any flow where a failure has significant user or financial consequences needs a runbook before the release gate passes.

  • Trigger — the exact monitoring condition. Not "error rate is high""the exam-submit error rate exceeds 5% for 5 consecutive minutes."
  • Steps — numbered, specific, timed. Not "investigate""check the sync error rate dashboard; if above threshold, proceed to step 2."
  • Rollback — almost always: disable the feature flag. The flag name, who has access, confirmed rollback time from rehearsal.
  • Communication template — pre-written message for the client if the incident exceeds 15 minutes.

Runbooks live in the repository, versioned alongside the code. Before every release, the runbook is rehearsed in staging — someone runs through the steps, the rollback is executed, the time is recorded.

Rollback rehearsed: confirmed 6 minutes. Ran by: Maya + Ran. Date: 14 April. Flag disabled, staging back to baseline.

This note is a release gate condition. "Rollback possible" is not. A confirmed time is.

Rollback discipline — four levels

  • Flag rollback — disable the flag. Seconds. Users return to the previous behaviour. No code change needed. This is the primary rollback for flagged features.
  • Deploy rollback — revert to the previous deployment. Minutes. The pipeline redeploys the last known-good build. Used when the issue is not flag-specific.
  • Migration rollback — reverse a schema change. Hours. Only possible if the migration was designed to be reversible. This is why backward-compatible migrations matter.
  • Data rollback — restore data to a previous state. May not be possible if writes have occurred. Plan for this before it happens — the plan is either "we have point-in-time recovery" or "we accept this risk and here's why."

Next — The release gate →

200apps · How We Work · NWIRE