session · operate & reflect

First-48-hours watch

48 hours after flag-on. On-call active. PO watches dashboards loose-then-sharp. Dashboards, not tickets — tickets lag reality by hours. Act on three conditions; log everything else.

When

Begins at flag enablement — that was a release-gate condition.
Continues for 48 hours — first hour noisiest, settles by hour 24, last 24 hours are normal-cadence.

Who

On-call — primary. Owns the alerts; pages if needed.
PO — watches the leading-signal dashboards (adoption, completion, error encounter).
Tech Lead — available; responds to on-call escalation.

Time-box

The watch is continuous in calendar, not in seat-time. The PO checks the dashboard hourly for the first 4 hours, then every 4 hours until hour 24, then twice daily to hour 48.

Inputs

The release brief (so the watcher knows what was promised).
The named SLIs and SLO thresholds.
The runbook for each named alert.
The leading-signals dashboard (adoption, completion, error encounter, time-on-state).

Agenda

Not a meeting — a discipline. What the PO watches each check:

Are dashboards within SLO? If yes, log.
Is error rate above SLO threshold and trending up? If yes, Incident war room.
Are leading signals telling a story? Adoption stalling? Completion rate weird? Error encounter at unexpected state? Note for the Signal reading session.
Are helpdesk tickets correlating with anything visible? Pattern → flag to Helpdesk reading for the week.

Three conditions warrant immediate action:

SLO threshold crossed for >5 min → open runbook, start from step one.
Any data integrity concern → disable the flag, investigate in staging.
Any security-relevant behaviour → disable the flag, full stop.

Everything else: log, prioritise via the bug taxonomy, address in normal flow.

Outputs

A 48-hour watch note — what was observed, what was acted on, what was logged. Filed alongside the release brief.
The baseline data that the Signal reading session will draw on.
Early signals of unexpected patterns that feed the next cycle's brief.

What good looks like

The PO does not act on first-hour noise. People click things in unexpected orders; errors that are not bugs appear; the discipline is not reacting incorrectly. Acting early is not a sign of control; acting correctly is.

By hour 48, the noisy first-contact patterns settle. The team has a first honest picture — not yet the prediction check, but the data the check will draw from.

Anti-pattern

Watching tickets, not dashboards. Tickets are a lagging, narrative-shaped signal; they tell the team what users complained about, hours after the metric showed the same thing. Fix: dashboards first; tickets second; the helpdesk reading later in the week interprets the ticket layer.

A second anti-pattern: acting on every signal. The team panics at first-hour noise, disables the flag, re-enables, disables again. Fix: the three named conditions are the only triggers for action. Everything else is logged for triage.

A third: the watch ends quietly because nothing happened. No baseline captured, no first-impression note, no input to the signal reading. Fix: "nothing happened" is itself the observation — capture it. The signal reading will reference the baseline.

First-48-hours watch ​

When ​

Who ​

Time-box ​

Inputs ​

Agenda ​

Outputs ​

What good looks like ​

Anti-pattern ​

See also ​