checklist · postmortem timeline during not after
Postmortem · timeline-during-not-after
Read before the postmortem session. The postmortem is bound to one incident, blameless, and produces one chain-level fix. More monitoring is rarely the right level.
TIP
The first gate is the timeline. If the timeline was not written during the incident, that is itself the postmortem's most important finding — the during-the-incident discipline is broken.
Before the postmortem
[ ] Schedule within 72 hours of the incident.
because:postmortems within 72h produced sticking fixes in 90% of cases vs 50% later — Practice · Step 1[ ] The during-the-incident timeline is in the doc, pasted not rewritten. If missing — that is the postmortem's first finding.
because:the chain-level walk has no anchor without the timeline — Clinic[ ] Attendees: TL + on-call who held the bridge + relevant trio. Plus leadership if SEV-1.
In the postmortem session
[ ] Read the timeline aloud at the start. No interpretation yet. Everyone hears what happened.
[ ] Walk the chain levels, in order: L1 → L2 → L3 → L4 → L5. For each: did this level produce or fail to catch the incident? Be honest. No is a valid answer.
because:postmortems without the walk fixed at L5 ("more monitoring") in 70% of cases — Practice · Evidence[ ] Name the structural cause in one paragraph. Specific. Not we need to be more careful. Bound to the chain level.
The one fix
[ ] Exactly one chain-level fix. Not three. Not five. One. The rest goes to retro or to a future postmortem if recurring.
[ ] Owned by a named person. the TL not engineering.
[ ] Dated specifically.By next cycle is a wish. 2026-05-30 is a commitment.
[ ] Testable. An observable that confirms the fix landed.
[ ] At the level the cause traces to, not at L5.More monitoring is the L5 reflex. Push one level up.
because:the level above is usually where the fix belongs — Practice · Step 4
Updates that happen the same day
[ ] The runbook used is edited today. What worked. What didn't. What would be faster next time. Today, not next cycle.
[ ] ADRs implicated are named. Status checked: still standing · drifting · re-open?
[ ] The what we are NOT doing section is filled. Pre-empts the "we should also…" thread. Names the symptom fixes the team will NOT do.
Blamelessness
[ ] Names of people appear only in the timeline. The fix is at a level, not a person.
because:the corpus's discipline against blame is the chain-level frame — Principle[ ] Leadership reads the postmortem directly. Not summarised. The artefact is the medium.
After the session
[ ] Publish in the repo's postmortem directory. Numbered, searchable by date / severity / chain level.
[ ] Linked from the runbook(s) used and from any ADRs implicated.
[ ] Followed up in the next retro: did the chain-level fix land on its date?
If the fix is more monitoring or we will be more vigilant, the postmortem is not done. Push one level up.
See also
- Practice — Postmortem
- Template — Postmortem
- Canon — After We Build · Incidents & Postmortems
- Clinic — A postmortem that produced a feeling