TL;DR. Completion rate, NPS, hours consumed, courses launched. Every standard L&D dashboard metric measures activity, not development. The number that would actually matter, behavior change under pressure, is invisible in most enterprises because nobody built the instrumentation for it. Completion is the vanity metric of L&D, defensible only because the alternative did not exist until recently. It does now. This piece names the broken stack, sketches what an honest development metric stack looks like, and explains why the first quarter under honest metrics always looks worse than the quarter that came before. That is the test.

Key Takeaways

87 percent of training content is forgotten 30 days after delivery (Brandon Hall Group), yet completion rates above 90 percent are routinely reported as evidence of effective training.
A 98 percent completion rate measures one thing, that 98 percent of seats clicked through to the end. It does not measure retention, behavior change, or business outcome.
The honest L&D metric stack has three layers, capability movement, behavior trend, and revenue link. None of them appear on a standard LMS dashboard.
The first quarter after switching to capability metrics always looks worse than the quarter before. That drop is the signal that the new numbers are real.
AI-driven evaluation of real work against a defined rubric, at scale, has been viable in production since roughly 2024. The instrumentation gap is now a leadership choice, not a technology constraint.

Completion is not competence: why every L&D metric you report is wrong

The L&D function in most enterprises lives on a small set of numbers that everyone in the building has agreed to find reassuring. Completion rate. Satisfaction score. Hours consumed. Courses launched. The numbers go up and to the right, the slide gets approved, the budget gets renewed. Outside the room, nobody is asking the obvious question, because the answer is uncomfortable and the dashboard makes it easy to keep not asking it. The question is whether anything has actually changed in how the team works.

This piece argues that the standard L&D metric stack is structurally wrong. Not slightly miscalibrated. Wrong in the sense that the numbers being reported, by design, cannot tell you whether training produced any development at all. Completion rate is the vanity metric of L&D the way page views were the vanity metric of digital media in 2009. It feels like data. It moves predictably. It correlates with effort. And it has almost no relationship to the outcome the function was built to produce.

The frame is harsh on purpose. The intent is not to embarrass L&D teams, most of whom know the metrics are weak and have been asking for better instruments for years. The intent is to make explicit what every L&D leader already suspects, which is that the dashboard they present quarterly has very little to do with whether their organization is getting better at the work, and to lay out what a real development metric stack looks like once the technology to build it has arrived.

Why is completion rate the wrong number to report?

Completion rate is the wrong number to report because it measures seat behavior, not skill behavior. A 98 percent completion rate means 98 percent of seats clicked through to the end. It says nothing about retention, transfer, or change in how the person performs in front of a customer. The metric is a faithful record of one fact and one fact only, that the LMS was used. Everything else is inference, and the inference rate is poor.

The evidence on this is decades old and depressing in its consistency. Brandon Hall Group's retention research puts 30-day retention from classroom-style training at roughly 10 to 30 percent, with 90-day retention often below 15 percent for content delivered without follow-up practice. The Ebbinghaus forgetting curve, refined since 1885, predicts the same shape across every replication. Anecdotally, every sales manager who has watched a rep complete an objection-handling module on Monday and lose the same objection on Wednesday already has the lived data. The classroom number was never going to predict the customer-conversation number, because the two activities are functionally unrelated. Reading about how to handle pricing pressure is not the same skill as holding the price when a procurement director starts an aggressive comparison.

There is a quieter problem under the headline number. Completion rate flatters the most disengaged behavior. A rep who turns the LMS on, mutes the audio, opens a second tab, and clicks Next every two minutes gets the same completion credit as a rep who took notes and rewatched the hard sections. The metric cannot distinguish presence from absence, let alone learning from compliance. As an instrument it has the same fidelity as counting the number of cars that drove past a billboard. You know that an event occurred. You do not know whether it produced any of the outcomes the budget assumed.

The reason completion rate persists, despite being widely understood as weak, is structural. Until very recently it was the only number the existing stack could produce automatically and at scale. The LMS could log clicks. It could not evaluate behavior. Reporting what the system could measure, even if the measurement did not answer the question, was preferable to reporting nothing. CFOs accepted it because the alternative was unmeasurable spend.

What does an honest L&D metric stack actually look like?

An honest L&D metric stack has three layers, capability movement, behavior trend, and revenue link. None of them are produced by an LMS. All three depend on a behavioral measurement engine that scores real work against a defined excellence model, with evidence cited from the actual interaction. The completion-rate dashboard is replaced, not augmented. The new stack reports on what the team can demonstrably do, how that has moved over time, and how that movement correlates with the business outcomes leadership cares about.

The three layers, in the order they get built:

Layer	What it measures	What it replaces
Capability movement	The team's average score against a defined excellence rubric, over time, per role and conversation type	Completion rate, courses launched, hours consumed
Behavior trend	The frequency and quality of specific observable behaviors in real customer or coaching interactions	Satisfaction score, NPS on training, self-reported confidence
Revenue link	The correlation between scorecard movement and downstream business outcome, win rate, deal size, retention, ramp time	Attendance reports, training spend per head, anecdotal manager testimonials

Capability movement is the foundation. Before any other metric makes sense, the organization has to define what excellent looks like for the conversations and decisions that matter, in observable, evidence-citable form. A scorecard for discovery quality. A rubric for pricing conversations. A standard of excellence for first-time manager 1:1s. Without that artifact, every downstream number is opinion in a spreadsheet. With it, the team has a number that moves, an artifact that justifies the score, and a target that everyone can see.

Behavior trend is the connective tissue between the rubric and the business. Capability movement on its own is one step removed from the work. Behavior trend asks the next question, are the specific behaviors that the rubric measures actually showing up in real interactions, more often, at higher quality, over the program window. Frequency of business-impact questions in discovery calls. Use of price-anchoring language under pressure. Number of feedback moments per manager 1:1. These are the behaviors the rubric is trying to produce, and they need to be tracked in the wild, not just in practice.

Revenue link is the metric that ends the budget argument. Once capability movement and behavior trend are in place, the correlation with downstream outcome stops being a faith claim and becomes a falsifiable analysis. Reps in the top quartile of scorecard improvement closed at a 17 percent higher rate than reps in the bottom quartile. Managers whose feedback frequency moved up saw 11 percent lower attrition on their teams. The numbers are not always flattering, which is the point. A real revenue link will sometimes show that training had no effect, or that the effect was concentrated in a specific subgroup, or that the intervention worked but the rollout did not. Each of those findings is more useful than another quarter of 98 percent completion.

Why do CFOs accept the broken metrics?

CFOs accept the broken L&D metrics for the same reason they accept any weak instrument, the alternative used to be no instrument at all. Faced with a choice between a faith-based number and no number, every finance team in the world will take the faith-based number and ask politely for it to get better. The technology constraint that made the faith-based number the only option has now lifted, and the polite request is about to become a sharper one. This is not a story about CFOs being naive. It is a story about CFOs being patient with a function that, until recently, genuinely could not produce a better answer.

The patience has limits. Three forces are converging that change the conversation. The first is that other functions in the organization have moved past the same problem. Marketing used to report on impressions and click-throughs. It now reports on pipeline contribution, with attribution models and cohort analyses that hold up to scrutiny. Customer success used to report on satisfaction surveys. It now reports on net revenue retention by segment, with cohort retention curves that finance can model. L&D is the last function in the executive team whose primary metric is essentially activity tracking, and the contrast on the slide deck has become hard to miss.

The second force is that the cost of capturing and evaluating real work has fallen by something like two orders of magnitude in the last 24 months. A language model can now transcribe a customer call, score it against a multi-dimensional rubric, cite evidence from the transcript, and produce a defensible report in minutes, at a per-interaction cost that makes 100 percent coverage feasible for the first time. The Brandon Hall Group reports that more than 70 percent of enterprises are now experimenting with AI in L&D contexts in some form. What was a budget-prohibitive aspiration in 2022 is now a deployment problem.

The third force is regulatory. In financial services, insurance, healthcare, and increasingly procurement, the question of whether an employee was actually competent at a given moment is moving from a soft expectation to a hard requirement. "We trained the team" is no longer a defense when advice goes wrong. Regulators want evidence of competence. Customers in regulated industries are starting to ask, in security questionnaires, what proof of behavioral consistency the vendor can offer. Completion rate does not survive that question. Capability movement does.

The combined effect on the CFO conversation is straightforward. The patience that funded faith-based metrics is running out, the technology that justified the patience is no longer the constraint, and the consequences of holding onto the old dashboard are starting to show up in places the L&D budget cannot defend, in lost deals, in regulatory exposure, in attrition data, in stalled rollouts. CFOs are not going to start refusing the L&D budget. They are going to start asking what specific behavioral instrument the spend is producing, and the room is going to get quiet.

What happens in the first quarter after switching metrics?

The first quarter after switching from completion-based metrics to capability-based metrics always looks worse than the quarter before. That is not a sign that the transition failed. It is a sign that the new metric is real and the old metric was not. The completion-rate dashboard was producing flattering numbers because it was measuring the wrong thing. The capability dashboard produces honest numbers because it is measuring the right one. The drop between them is the gap between the story L&D was telling and the reality the team was living.

Three things tend to happen in that first quarter, and they need to be flagged in advance so leadership does not panic at the wrong moment. The first is that the rubric itself will be wrong in places. Excellence definitions are always wrong on first pass, because the tacit knowledge in the top performers' heads is not fully articulated until it is forced into observable criteria. The first 60 days of measurement surface those gaps. The fix is to refine the rubric, not to retreat to the old dashboard.

The second is that the team's scores will be lower than anyone expected. Sales managers who would have predicted their team's discovery quality at 75 out of 100 see a number that lands at 58. Customer success leaders who assumed their renewals conversations were strong see specific dimensions, around proactive risk surfacing, scoring below 50. This is the most important number the function will report in its history. It is the baseline. Every quarter after this is measured against it, and the comparison is the actual story.

The third is that some managers will try to dilute the rubric. Excellence definitions that produce uncomfortable scores invite quiet pressure to soften the criteria. The diluted rubric will produce friendlier numbers, the dashboard will recover, and the function will have spent a quarter building an instrument and then sanding it down to match the old dashboard. The single biggest threat to a capability-metrics rollout is the instinct to flinch at the first honest report. Leadership teams that publish the number, name the gap, and treat the next quarter as the first real baseline come out of the transition with an L&D function that can finally answer the question the CFO has been asking quietly for thirty years. Leadership teams that dilute the rubric come out with a slightly more sophisticated version of the old vanity metric.

How do you start the transition without breaking the existing reporting?

The realistic path is to run both metric stacks in parallel for two quarters, publish capability movement as the secondary metric while completion rate stays primary, and let leadership choose when to flip the priority. The transition does not require dismantling the LMS, killing the satisfaction surveys, or refusing to report completion. It requires adding one layer on top, in a single, well-chosen function, and letting the new numbers earn their position over time. This is how every honest metric in every other function got introduced. The dashboards did not change overnight. The conversation around them did.

A sensible 90-day starting sequence:

Pick the function with the most direct revenue link. In most enterprises this is sales, because the outcome of a discovery call or a pricing conversation is downstream-measurable in pipeline data within weeks. The same approach works for customer success and first-line management later. Starting in sales gets the revenue-link metric on the board fastest.
Define excellence for one conversation type. Not the full stack of seven conversation types the team eventually needs. One. Discovery, pricing, or renewal. Two or three top performers, two or three senior managers, six to eight observable criteria with evidence-citable definitions. The artifact takes about three days of focused work and is the most valuable output of the entire quarter.
Score 50 to 100 real interactions against the rubric. The number does not have to be statistically perfect. It has to be honest. Capture transcripts, run them against the rubric, cite evidence. The output is the baseline.
Publish the baseline alongside the completion-rate dashboard. The slide moves from "98 percent of the team completed the module" to "98 percent of the team completed the module, and discovery quality across 80 measured calls scored 58 out of 100. Target: 70 by end of next quarter." That second sentence is the entire transition.
Repeat measurement and publish movement. The first time the capability number moves in the right direction, the conversation about L&D inside the executive team changes shape. The first time it moves and pipeline conversion moves with it, the budget conversation changes shape.

The transition is not technical. The hardest part is institutional. It requires a leadership team that is willing to publish an unflattering number, name the gap, and trust that honest measurement compounds faster than flattering measurement. Every organization that has made the switch reports the same arc, the first quarter is uncomfortable, the second quarter starts to move, the third quarter rewrites the budget conversation.

What this means for the next executive review

The dashboard the L&D function takes into the next executive review is the artifact that will define the function for the following year. If it leads with completion rate, the function is reporting on activity in a budget environment that increasingly rewards capability. If it leads with capability movement, behavior trend, and revenue link, even at honest, uncomfortable numbers, the function is reporting on development in a budget environment that has waited thirty years for that report. The choice is no longer a technology constraint. The instrumentation exists. The argument inside the room is about whether the leadership team is ready to look at the real number.

The L&D functions that take this seriously in 2026 are not necessarily the ones with the most AI-forward marketing. Many of them are quiet about it. They have started running the new metric stack in one function, alongside the old one, and they are compounding the advantage of an honest measurement loop quarter by quarter. The functions that hold onto completion rate will, predictably, keep producing the same flattering number while the gap between reported development and actual development widens. The CFO will keep paying for it, for a while, until a regulator, a customer, or a sharp board member asks the question completion rate cannot answer.

That question is coming. It is already in some security questionnaires. It is already in some board packs. The L&D function that has an answer ready will spend the next decade as an operating lever. The function that does not will spend the next decade defending a number that everyone in the room has stopped believing.

Frequently asked questions

How do you measure training effectiveness?

By measuring capability movement, behavior trend, and revenue link against a defined excellence model, not by measuring completion, satisfaction, or hours consumed. Capability movement is the team's average score on a documented rubric, over time, with evidence cited from real interactions. Behavior trend is the frequency and quality of the specific behaviors the rubric targets, observed in real customer or coaching conversations. Revenue link is the correlation between scorecard movement and downstream business outcomes such as win rate, deal size, retention, or ramp time. None of the three are produced by a standard LMS dashboard.

What KPIs should L&D track in 2026?

Three KPI families replace the standard activity stack. Capability movement, expressed as scorecard score per role and conversation type, tracked quarter over quarter. Behavior trend, expressed as the frequency of target behaviors in real interactions. Revenue link, expressed as the correlation between top-quartile scorecard movement and downstream business outcome. Completion rate, NPS on training, and hours consumed can remain as input metrics but should never be the headline number in an executive review.

Why does the Brandon Hall Group retention curve matter?

Because it documents, across decades of replication, that the assumption sitting under the entire completion-rate dashboard is false. The assumption is that someone who finishes a training module retains the content. Brandon Hall and the underlying Ebbinghaus forgetting curve put 30-day retention from classroom-style content at roughly 10 to 30 percent, and 90-day retention often below 15 percent. The completion-rate number is, by the data we have had since 1885, a poor predictor of the outcome the dashboard implies.

Is this just an argument for measuring things L&D cannot control?

No. Capability movement, behavior trend, and revenue link are not outside L&D's influence, they are the outcomes L&D was funded to produce. Reporting on them honestly is the only way the function gets credit when the numbers move and the only way it gets the budget signal when they do not. The objection that L&D should not be held accountable for behavior change is the objection that has kept the function reporting completion rate for thirty years. It has not served the function well.

How does this connect to AI coaching?

AI coaching is the deployment layer that makes the new metric stack practical. Defining excellence, capturing real work, scoring against the rubric with evidence, and producing targeted practice are operations that until 2024 required armies of human coaches and were therefore impossible at enterprise scale. A modern AI coaching platform does these operations continuously, per individual, against a leader-defined rubric, in production, at a per-interaction cost that makes 100 percent coverage feasible. The metric stack is the reporting view; the AI coaching platform is what produces the data the stack reports.

Completion Is Not Competence: Why Every L&D Metric You Report Is Wrong