session · operate & reflect
First-48-hours watch
48 hours after flag-on. On-call active. PO watches dashboards loose-then-sharp. Dashboards, not tickets — tickets lag reality by hours. Act on three conditions; log everything else.
When
- Begins at flag enablement — that was a release-gate condition.
- Continues for 48 hours — first hour noisiest, settles by hour 24, last 24 hours are normal-cadence.
Who
- On-call — primary. Owns the alerts; pages if needed.
- PO — watches the leading-signal dashboards (adoption, completion, error encounter).
- Tech Lead — available; responds to on-call escalation.
Time-box
The watch is continuous in calendar, not in seat-time. The PO checks the dashboard hourly for the first 4 hours, then every 4 hours until hour 24, then twice daily to hour 48.
Inputs
- The release brief (so the watcher knows what was promised).
- The named SLIs and SLO thresholds.
- The runbook for each named alert.
- The leading-signals dashboard (adoption, completion, error encounter, time-on-state).
Agenda
Not a meeting — a discipline. What the PO watches each check:
- Are dashboards within SLO? If yes, log.
- Is error rate above SLO threshold and trending up? If yes, Incident war room.
- Are leading signals telling a story? Adoption stalling? Completion rate weird? Error encounter at unexpected state? Note for the Signal reading session.
- Are helpdesk tickets correlating with anything visible? Pattern → flag to Helpdesk reading for the week.
Three conditions warrant immediate action:
- SLO threshold crossed for >5 min → open runbook, start from step one.
- Any data integrity concern → disable the flag, investigate in staging.
- Any security-relevant behaviour → disable the flag, full stop.
Everything else: log, prioritise via the bug taxonomy, address in normal flow.
Outputs
- A 48-hour watch note — what was observed, what was acted on, what was logged. Filed alongside the release brief.
- The baseline data that the Signal reading session will draw on.
- Early signals of unexpected patterns that feed the next cycle's brief.
What good looks like
The PO does not act on first-hour noise. People click things in unexpected orders; errors that are not bugs appear; the discipline is not reacting incorrectly. Acting early is not a sign of control; acting correctly is.
By hour 48, the noisy first-contact patterns settle. The team has a first honest picture — not yet the prediction check, but the data the check will draw from.
Anti-pattern
Watching tickets, not dashboards. Tickets are a lagging, narrative-shaped signal; they tell the team what users complained about, hours after the metric showed the same thing. Fix: dashboards first; tickets second; the helpdesk reading later in the week interprets the ticket layer.
A second anti-pattern: acting on every signal. The team panics at first-hour noise, disables the flag, re-enables, disables again. Fix: the three named conditions are the only triggers for action. Everything else is logged for triage.
A third: the watch ends quietly because nothing happened. No baseline captured, no first-impression note, no input to the signal reading. Fix: "nothing happened" is itself the observation — capture it. The signal reading will reference the baseline.
See also
- Canon — After We Build · The First 48 Hours
- Area — First 48 Hours
- Practice — First 48 hours watch
- Checklist — First 48 hours · agenda
- Next session — Signal reading session (when the check date arrives)