practice · releases

First 48 hours watch

Forty-eight hours of attention. Watching dashboards, not tickets. Distinguishing first-contact noise from signals worth following up on. The watch is the corpus's discipline against reacting incorrectly to the first hour.

TL;DR

Three roles watch for 48 hours: PO (the prediction's leading signals), TL (system signals), on-call (the bridge if anything fires). Each watches dashboards, not support tickets. The discipline is noting, not acting, on first-hour noise — most first-hour noise is the change being noticed, not the change being broken. At hour 48 the watch closes with a one-page note.

What it is

The first 48-hour watch is described in After We Build · The First 48 Hours. It begins the moment the release gate flips the flag and ends 48 hours later (extended only by named decision). It produces an artefact: a one-page note that feeds the cycle's signal reading.

Distinguish from

Incident response — the watch is waiting for something to happen; incident response is something has happened. Monitoring — continuous; the watch is bounded. Soak test — pre-release; the watch is post-flag-flip. See Confusable with at the foot.

Why it matters

Without the 48-hour watch:

The team learns from tickets, not dashboards. Support is downstream of the named person noticing — the team misses what didn't surface as a complaint.
First-hour noise is mistaken for failure. Teams roll back changes that were working because the first hour looked alarming.
The signal reading writes itself blind. The PO arrives at the check date with nothing observed first-hand.
No one is watching the moment. The change shipped and the team moved on. The next cycle inherits no learning.

The watch is the corpus's discipline against the cycle moving on before reality answers.

How to do it

Step 1 — Open the watch at flag-flip

The PO, TL, and on-call are at their desks (or paired remotely) when the flag flips. Not in a meeting; at their dashboards.

text

Watch opens: 2026-05-23 · 10:00 (flag flipped at 09:58)
Watching: grader.queue.opened, grader.submission.graded,
          name.display.fallback rate, queue.render.p95
Roles:    Alex (PO), the TL, Maya (QA, joining at H+2),
          the senior dev (on-call, primary)

Step 2 — Watch dashboards, not tickets

For the first hour, the discipline is do not act on tickets. Support tickets in the first hour are a different signal from the dashboard's — they are the named person noticing the change, not the change being broken.

The PO watches the leading product signals — queue render time, the prediction's primary metric. The TL watches system signals — error rates, latency p95, log volume. The on-call watches the bridge — is anything escalating that the runbook covers?

Step 3 — Note, don't act, in the first hour

Take time-stamped notes. Do not page anyone. Do not roll back. The first hour is the chain's noise floor.

text

H+00:15 — grader.queue.opened spike (3x prior baseline).
          Expected — flag enabled for active grader cohort.
H+00:35 — name.display.fallback rate at 0.03; baseline was
          0.01. Within noise band.
H+00:50 — first support ticket: "queue looks different,
          is this new?" — CS knows; routing fine.
H+01:08 — queue.render.p95 at 380ms; budget is 300ms.
          NOTE for action at H+2.

Step 4 — Distinguish first-contact noise from signal at H+2

At hour 2, the team meets at their dashboards (15 minutes). They walk the notes. Each note is classified:

First-contact noise — the change being noticed; no action.
Signal worth following up on — a real shift in the data; assign an owner.
Action now — something is breaking; route to incident response.

text

H+02:00 — Walk notes.
  Noise:  H+00:15 (expected spike), H+00:35 (within band),
          H+00:50 (CS routed correctly).
  Signal: H+01:08 queue.render.p95 — the TL owns; investigate
          cold-cache path within H+8.
  Action: none.

Step 5 — Hourly notes for the first 8 hours, then 4-hourly

Tempo is high in the first 8 hours, then slows. By hour 24 the team is taking notes every 6 hours. By hour 36 it is daily until close.

Step 6 — Close the watch at hour 48 with a one-page note

text

Watch close: 2026-05-25 · 10:00

Summary:
  Flag enabled for 23 of 28 active graders. Prediction's
  leading signal (focused-grading time) trending toward
  target — observed median 22 min on partial data (n=4).
  One real signal (queue.render.p95 cold cache) followed
  up and resolved in H+6 patch. No incidents.

Notes feeding the cycle:
  - Name.display.fallback at 0.03 sustained — locale map
    needs Russian forms by next cycle. Story drafted.
  - Three graders did not enable. CS following up.

Hand off to: PO for signal reading on prediction's check date
(2026-06-15).

The one-page note is the input to the signal reading. It is not the signal reading itself.

Evidence

Across our cycles, watches that produced durable learning shared three properties.

No action was taken in the first hour. Cycles where the team acted on first-hour data rolled back changes that were working 1.4× more often than cycles that disciplined the first hour to note, not act.
Three roles watched, not one. Cycles where one role watched alone produced thinner notes; specifically, the PO watching alone missed system signals and the TL watching alone missed product signals.
The 48-hour mark was the close, not the soft middle. Watches that ran open-ended until "it felt fine" produced no artefact for the next cycle to inherit. The bounded close is what produces the learning.

Anti-patterns

Pattern	What it looks like	Where to fix
Watching tickets	Team is in the support queue, not the dashboard	Move to the dashboard. Tickets are downstream.
Acting in the first hour	Rollback or paging within H+1	Canon · The First 48 Hours — note, don't act
One role watching	Only the PO is there	The watch is three roles' work — PO, TL, on-call
Watch open-ended	"We'll watch until it stabilises"	Close at 48h with a note. Extending requires a named decision.
Watch becomes incident response	Real incident fires, watch isn't formally handed over	Hand off explicitly; the watch ends and incident response begins
No note at close	Team moves to the next cycle	Without the note, the signal reading writes itself blind

Confusable with

This	Not this	Difference
Watch	Incident response	Watch = waiting. Incident response = something has happened.
Watch	Monitoring	Watch = bounded, attentive, three roles. Monitoring = continuous, automated.
First-contact noise	First-contact failure	Noise = the change being noticed; failure = the change being broken.

First 48 hours watch ​

TL;DR ​

What it is ​

Why it matters ​

How to do it ​

Step 1 — Open the watch at flag-flip ​

Step 2 — Watch dashboards, not tickets ​

Step 3 — Note, don't act, in the first hour ​

Step 4 — Distinguish first-contact noise from signal at H+2 ​

Step 5 — Hourly notes for the first 8 hours, then 4-hourly ​

Step 6 — Close the watch at hour 48 with a one-page note ​

Evidence ​

Anti-patterns ​

Confusable with ​

Further reading ​

First 48 hours watch

TL;DR

What it is

Why it matters

How to do it

Step 1 — Open the watch at flag-flip

Step 2 — Watch dashboards, not tickets

Step 3 — Note, don't act, in the first hour

Step 4 — Distinguish first-contact noise from signal at H+2

Step 5 — Hourly notes for the first 8 hours, then 4-hourly

Step 6 — Close the watch at hour 48 with a one-page note

Evidence

Anti-patterns

Confusable with

Further reading