Blue Sage Data Systems
methodology · November 25, 2025

Most AI Training Is a One-Hour Zoom and a Recording Nobody Watches

Most "AI training" is a one-hour zoom and a recording nobody watches. Here's what training that compounds looks like.

Migrated from earlier notebooks

At a professional services firm in Lincoln, a new AI tool went live in February. The vendor ran a 90-minute onboarding session over Zoom. They shared a recording and a link to the help documentation. By April, two of the seven people who attended were using the tool regularly. The other five had reverted to their prior workflows within three weeks of the session. Nobody said the tool was bad. They said they couldn’t remember how to use it when they needed it, and by the time they figured it out, it was faster to do it the old way.

One-shot training — a single onboarding session, a recording, maybe a written guide — fails at roughly the same rate across industries and tool types because it confuses exposure with capability. A person who has watched a demo knows what the tool is supposed to do. A person who can use a tool confidently under workload pressure — when she’s behind on three deliverables and the client is waiting — has something different. Getting from the first thing to the second thing takes more than 90 minutes on a Tuesday afternoon.

The firms that end up with tools that actually run — used daily by the people they were built for, incorporated into how the work is done — get there through a different training structure.

Why one-shot training fails

One-shot training fails for two reasons that compound each other.

The first is timing. Training happens at launch, which is precisely the moment when the tool is least familiar, the workflows aren’t fully settled, and the people being trained are most anxious about whether the tool will actually work for their specific situations. The questions they ask in the training session are often generic — “what can this do?” — rather than the operational questions that would have shown up four weeks in: “why does it pull the wrong project history when I have two active clients with similar names?” One-shot training answers the first kind of question and can’t answer the second, because the second hasn’t been discovered yet.

The second reason is that skills decay without practice. The people who don’t use the tool in the two weeks immediately following training lose most of what they retained. By week three, the training session is a vague memory. When they do need the tool, they’re starting almost from scratch — and the gap between where they are and where they’d need to be to use it confidently is wide enough that the old workflow wins.

These two failures are predictable. They’re not failures of the training session. They’re failures of training architecture.

The shadow-then-solo pattern

The pattern that produces a different outcome starts with two weeks of supervised use, not a training session.

In the shadow phase, the trainer — or the build team, or a designated internal lead who’s been trained more deeply — sits with each user for a real work task. Not a demo scenario. Not a training dataset. The actual work the user would do today: a real document, a real set of inputs, a real output the user will actually review and use.

The trainer runs the tool while the user watches and asks questions. The user takes notes, asks about the steps she doesn’t understand, and identifies the places where the output doesn’t match what she expected. Those gaps — the mismatch between what the tool produced and what the user needed — are the calibration signal. Some of them are training issues: the user’s mental model of what the tool does is wrong. Some of them are tool issues: the tool needs an adjustment to handle this user’s specific workflow correctly.

In the solo phase, the user runs the tool on her own work tasks with the trainer available by message. The trainer isn’t in the room, but the user knows she can ask a question without a formal support ticket. Most users in the solo phase ask between zero and three questions in the first two weeks. Those questions are usually specific: “when I do X, it gives me Y, but I expected Z — is that a training issue or a tool issue?”

By week four, most users have accumulated enough successful repetitions to build confidence. They’ve encountered the tool’s edge cases for their specific workflow. They know when to trust the output and when to scrutinize it. They’re past the threshold where the old workflow wins by default.

The runbook as training deliverable

The runbook is not a training supplement. It is the primary training deliverable.

A well-built runbook for an AI workflow answers the questions a user has when she’s alone at her desk, behind on work, and the tool is doing something she doesn’t recognize. It covers: what the tool does in plain terms, the step sequence for the most common tasks, what the output looks like when it’s working correctly, what the output looks like when something is wrong, and what to do in each case.

The runbook is written for the user who didn’t come to the training session, who came but didn’t retain it, and who will be explaining the tool to a new hire six months later. It uses the actual field names and output labels from the tool, not generic descriptions. It includes screenshots or recordings for the steps that are hard to describe in text.

The test of a good runbook: can a new employee use the tool correctly on a standard task after reading it, with no additional help? If yes, the runbook is doing its job. If no, it needs more work — because the new-hire scenario will occur, and the alternative is a long call with whoever still remembers how the tool works.

The runbook is a living document. When the tool changes, the runbook changes. When users discover a new edge case, it goes in. The build team delivers a runbook current as of the hand-off date; the client keeps it current after that.

Monthly cadence as continuing education

The monthly check-in isn’t a sales call. It’s the continuing education layer.

At 30 days, the check-in covers what’s working and what isn’t from the users’ perspective, calibration issues that surfaced in real use, and any workflow questions that came up in the solo phase that weren’t fully resolved. The build team leaves the check-in with a short list of adjustments — things to tune, things to document more clearly, things to train on more deeply.

At 60 days, the check-in looks at the metrics: time saved per task, correction rate on AI outputs, adoption across the team. Low adoption usually traces to one of three things: a user who didn’t get enough shadow time, a workflow edge case the tool doesn’t handle well, or a mental model mismatch the training didn’t catch. All are fixable with targeted follow-up, not another all-hands session.

At 90 days, most teams are past the adoption hump. The check-in shifts from troubleshooting to optimization: what’s the next workflow that fits this pattern, what have the power users figured out that the rest of the team hasn’t, what does the runbook need based on three months of real use.

The compounding comes from this sequence. Each check-in makes the tool more calibrated for the team’s actual workflows, adds to the runbook, and surfaces a user who’s become a power user and can start training new hires. By month six, most of what the build team knows about how the tool works is in the documentation and the team’s own institutional knowledge — not locked in the build team’s heads.

What it looks like at the 90-day mark

At a mid-market accounting firm in Omaha, a tax workpaper tool went live in mid-January, timed for busy season. The shadow phase happened with two senior associates in the first two weeks: real returns, real workpapers, the trainer running the tool while each associate watched and asked questions. By week three, both associates were running the tool solo on their own client workpapers.

By March — six weeks into busy season — four of the five people on the tax team were using the tool on their highest-volume return types. The one holdout was a senior manager who had missed the shadow session due to a client trip. She got a one-on-one shadow session in week five, ran the tool solo in week six, and was on board by week seven.

At 90 days, the check-in metrics looked like this: median time for a complex return workpaper review was down from about three and a half hours to just under two hours. The correction rate on AI output was running at roughly 25% — meaning on one in four returns, an associate made a substantive correction to the AI-flagged exception report. The build team treated that as a calibration signal and adjusted two of the flag thresholds. The runbook had been updated three times based on questions from the team. Two associates had each trained one new staff person using the runbook without help from the build team.

That’s what training that sticks looks like at 90 days. Not a team that attended a Zoom and mostly went back to their old workflows. A team that owns the tool.

For more on how Blue Sage structures the training and hand-off phase of every engagement, see how we work.

→ Start here

Text Rosey to begin.

Rosey is our executive-assistant bot. Text the number below — she'll ask two questions, offer three calendar slots, and put a 30-minute call on Jim's calendar.

Text Rosey · Schedule a call →

or call 415 481 2629