practice · releases
First 48 hours watch
Forty-eight hours of attention. Watching dashboards, not tickets. Distinguishing first-contact noise from signals worth following up on. The watch is the corpus's discipline against reacting incorrectly to the first hour.
TL;DR
Three roles watch for 48 hours: PO (the prediction's leading signals), TL (system signals), on-call (the bridge if anything fires). Each watches dashboards, not support tickets. The discipline is noting, not acting, on first-hour noise — most first-hour noise is the change being noticed, not the change being broken. At hour 48 the watch closes with a one-page note.
What it is
The first 48-hour watch is described in After We Build · The First 48 Hours. It begins the moment the release gate flips the flag and ends 48 hours later (extended only by named decision). It produces an artefact: a one-page note that feeds the cycle's signal reading.
Distinguish from
Incident response — the watch is waiting for something to happen; incident response is something has happened. Monitoring — continuous; the watch is bounded. Soak test — pre-release; the watch is post-flag-flip. See Confusable with at the foot.
Why it matters
Without the 48-hour watch:
- The team learns from tickets, not dashboards. Support is downstream of the named person noticing — the team misses what didn't surface as a complaint.
- First-hour noise is mistaken for failure. Teams roll back changes that were working because the first hour looked alarming.
- The signal reading writes itself blind. The PO arrives at the check date with nothing observed first-hand.
- No one is watching the moment. The change shipped and the team moved on. The next cycle inherits no learning.
The watch is the corpus's discipline against the cycle moving on before reality answers.
How to do it
Step 1 — Open the watch at flag-flip
The PO, TL, and on-call are at their desks (or paired remotely) when the flag flips. Not in a meeting; at their dashboards.
Watch opens: 2026-05-23 · 10:00 (flag flipped at 09:58)
Watching: grader.queue.opened, grader.submission.graded,
name.display.fallback rate, queue.render.p95
Roles: Alex (PO), the TL, Maya (QA, joining at H+2),
the senior dev (on-call, primary)Step 2 — Watch dashboards, not tickets
For the first hour, the discipline is do not act on tickets. Support tickets in the first hour are a different signal from the dashboard's — they are the named person noticing the change, not the change being broken.
The PO watches the leading product signals — queue render time, the prediction's primary metric. The TL watches system signals — error rates, latency p95, log volume. The on-call watches the bridge — is anything escalating that the runbook covers?
Step 3 — Note, don't act, in the first hour
Take time-stamped notes. Do not page anyone. Do not roll back. The first hour is the chain's noise floor.
H+00:15 — grader.queue.opened spike (3x prior baseline).
Expected — flag enabled for active grader cohort.
H+00:35 — name.display.fallback rate at 0.03; baseline was
0.01. Within noise band.
H+00:50 — first support ticket: "queue looks different,
is this new?" — CS knows; routing fine.
H+01:08 — queue.render.p95 at 380ms; budget is 300ms.
NOTE for action at H+2.Step 4 — Distinguish first-contact noise from signal at H+2
At hour 2, the team meets at their dashboards (15 minutes). They walk the notes. Each note is classified:
- First-contact noise — the change being noticed; no action.
- Signal worth following up on — a real shift in the data; assign an owner.
- Action now — something is breaking; route to incident response.
H+02:00 — Walk notes.
Noise: H+00:15 (expected spike), H+00:35 (within band),
H+00:50 (CS routed correctly).
Signal: H+01:08 queue.render.p95 — the TL owns; investigate
cold-cache path within H+8.
Action: none.Step 5 — Hourly notes for the first 8 hours, then 4-hourly
Tempo is high in the first 8 hours, then slows. By hour 24 the team is taking notes every 6 hours. By hour 36 it is daily until close.
Step 6 — Close the watch at hour 48 with a one-page note
Watch close: 2026-05-25 · 10:00
Summary:
Flag enabled for 23 of 28 active graders. Prediction's
leading signal (focused-grading time) trending toward
target — observed median 22 min on partial data (n=4).
One real signal (queue.render.p95 cold cache) followed
up and resolved in H+6 patch. No incidents.
Notes feeding the cycle:
- Name.display.fallback at 0.03 sustained — locale map
needs Russian forms by next cycle. Story drafted.
- Three graders did not enable. CS following up.
Hand off to: PO for signal reading on prediction's check date
(2026-06-15).The one-page note is the input to the signal reading. It is not the signal reading itself.
Evidence
Across our cycles, watches that produced durable learning shared three properties.
- No action was taken in the first hour. Cycles where the team acted on first-hour data rolled back changes that were working 1.4× more often than cycles that disciplined the first hour to note, not act.
- Three roles watched, not one. Cycles where one role watched alone produced thinner notes; specifically, the PO watching alone missed system signals and the TL watching alone missed product signals.
- The 48-hour mark was the close, not the soft middle. Watches that ran open-ended until "it felt fine" produced no artefact for the next cycle to inherit. The bounded close is what produces the learning.
Anti-patterns
| Pattern | What it looks like | Where to fix |
|---|---|---|
| Watching tickets | Team is in the support queue, not the dashboard | Move to the dashboard. Tickets are downstream. |
| Acting in the first hour | Rollback or paging within H+1 | Canon · The First 48 Hours — note, don't act |
| One role watching | Only the PO is there | The watch is three roles' work — PO, TL, on-call |
| Watch open-ended | "We'll watch until it stabilises" | Close at 48h with a note. Extending requires a named decision. |
| Watch becomes incident response | Real incident fires, watch isn't formally handed over | Hand off explicitly; the watch ends and incident response begins |
| No note at close | Team moves to the next cycle | Without the note, the signal reading writes itself blind |
Confusable with
| This | Not this | Difference |
|---|---|---|
| Watch | Incident response | Watch = waiting. Incident response = something has happened. |
| Watch | Monitoring | Watch = bounded, attentive, three roles. Monitoring = continuous, automated. |
| First-contact noise | First-contact failure | Noise = the change being noticed; failure = the change being broken. |
Further reading
- Canon — After We Build · The First 48 Hours · Signal & The Prediction · Bugs and Their Roots
- Practice — Release gate — the predecessor
- Checklist — First 48 hours watch
- Template — Runbook — what the watch leans on if something fires
- Skill path — On-call foundations · Step 5 · PO foundations · Step 7
- Reference — Area · First 48 Hours