clinic · a brief written from the solution backwards
A brief written from the solution backwards
The Feature Brief looks correct. Every section is filled in. The trio signed. But the brief was written after the team had already decided what to build. The prediction was reverse-engineered to fit the chosen feature, not the witnessed problem.
The artefact
Excerpt — Feature Brief, "AI-Assisted Feedback", April 2026
Experience snapshot: Graders spend significant time writing feedback for student submissions. They often write similar feedback across multiple students, especially for common mistakes. The new AI-Assisted Feedback capability will suggest feedback based on the submission content, allowing graders to accept or edit suggestions.
Purpose: Enable graders to grade more submissions per session by reducing feedback-writing time.
In scope:
- AI-generated feedback suggestions per submission
- Accept / edit / reject UX on suggestions
- Feedback model trained on prior approved feedback
- Admin dashboard for AI feedback quality
Out of scope: Use of AI for grade assignment.
Prediction:
- Baseline: ~20% of grading time spent on feedback writing
- Target: 50% reduction in feedback-writing time
- Check date: 2026-09-15
- Check method: Survey graders post-rollout; instrument feedback acceptance rate
- Owner: Alex (PO)
Sign-off: PO ✅ · Designer ✅ · Tech Lead ✅
The brief is structurally complete. Each section is filled. The trio signed. The cycle is about to start.
What's wrong?
Stop. Read the artefact again. Find at least three things wrong before reading the diagnosis.
Diagnosis (open when ready)
1. The brief begins with the solution, not the moment
Graders spend significant time writing feedback…
This is the solution's frame, not the named person's. There is no Wednesday morning, no Gal, no Hebrew name, no alt-tab. There is a category (graders), an activity (writing feedback), and a because-why (often similar feedback). All three are pre-shaped by the chosen solution — AI suggestions.
A brief that began with the moment would have started: Gal sits down at 08:50 to grade CS101 finals. She is on submission three. She has written almost the same paragraph of feedback for two students who made the same mistake about pointer arithmetic. She copies the paragraph from the previous submission…
That brief might have ended with AI suggestions. It might also have ended with a "duplicate this feedback to the next N similar submissions" button — a far cheaper change that solves the same friction. The team will never know, because the brief committed to AI before it witnessed.
2. The prediction's baseline is a guess in the shape of a number
Baseline: ~20% of grading time spent on feedback writing
The tilde is the warning sign. ~20% is a feeling. There is no sample, no date, no method. Compare to a witnessed baseline: 47 min mean per cycle, n=12, captured 2026-04-22 by direct observation of three named graders. Of that, 18 min on feedback typing (95% CI 14–22 min).
The brief's ~20% is a number the team needed in order to claim 50% reduction. The target was the destination; the baseline was reverse-engineered to make the target legible. This is the chain failure that costs most: the cycle will measure against a baseline that was never real.
3. The check method is a survey
Survey graders post-rollout; instrument feedback acceptance rate
The survey is the warning. The corpus rule from Signal & The Prediction: the check method has the same shape as the baseline method. If the baseline was witnessed by sitting next to the named person, the check is witnessed by sitting next to the named person. A survey-based check on a baseline that was never witnessed is unfalsifiable — the team can ship anything and the survey can claim success.
Feedback acceptance rate is the wrong metric too. It measures whether graders click accept; it does not measure whether grading is faster. A grader who clicks accept and then rewrites the suggestion has saved nothing — but the acceptance rate counts the click.
4. Out-of-scope is a single line
Out of scope: Use of AI for grade assignment.
One line. That is not out-of-scope; that is a disclaimer. A real out-of-scope explores what the team is not doing this cycle even though it would be tempting:
Out of scope (this cycle):
- Personalised AI per grader. Reason: training-data scope explodes.*
- Feedback model fine-tuning per course. Reason: defer until baseline cycle.*
- Auto-pre-fill on queue open. Reason: changes the journey step we are measuring.*
A one-line out-of-scope is a brief that has not been fought. When scope is not fought, it leaks.
The fix
The team paused the cycle. They held a second observation session — three named graders, two hours. They learned three things their AI brief had not.
- Graders did not experience feedback writing as the slow step. The slow step was Hebrew-name copying (which the brief above does not mention).
- When asked, graders said feedback was "fine, I have my own paragraphs". The "AI suggestions" feature would compete with a workaround graders had already built — and won. Graders would have rejected most suggestions.
- The real compound friction was the alt-tab between LMS and spreadsheet for Hebrew names — which is what Clinic — A brief that didn't witness describes.
The team killed the AI feedback initiative. They wrote a new Feature Brief, this time starting from the moment. It shipped a Hebrew-name flow. Grading time fell from 47 min to 13 min. The team's calibration improved more than the metric.
Experience snapshot:
It is Wednesday morning. Gal sits at 08:50 with coffee, opens
the LMS, and starts the morning's CS101 finals. Of the seven
in front of her, four have Hebrew names. Each one she opens,
she reads, then alt-tabs to a spreadsheet to copy the name
with diacritics, then back to the LMS. By 10:00 she has
graded four. By 10:30 she stops for water.
Purpose: Cut Gal's grading-cycle time so she can grade an
afternoon batch without losing the day to it.
Prediction:
Baseline: 47 min mean, n=12, observed 2026-04-22
Target: <15 min mean, n>=8 observed cycles
Check date: 2026-06-15
Check method: Three observation sessions, three named graders,
in the field, stopwatch + time-on-task cross-check
Owner: Alex (PO)Where this comes from in the chain
This failure traces to Discovery (Level 2) — but its symptom presents at Scope (Level 3). The trio signed off without catching that the prediction was reverse-engineered. The structural fix is at Discovery — observation before the solution is chosen. The trio's discipline at Level 3 catches it second.
Tracing this defect: the failure was a missing observation session. The story that surfaced it was a support ticket from a grader six weeks into the cycle saying I never use the AI suggestions. The team's first instinct was we'll tune the model. The structural fix was kill the initiative and re-witness.
See also
- Practice — Writing feature briefs · Writing predictions
- Canon — Before We Build · Person & Moment · Observation · Feature Brief
- Clinic — A brief that didn't witness (the failure mode this clinic is a sibling to)
- Canon — After We Build · The Model Update (where the team would amend the model after this failure)
- Principle — Witnessed, not described