Field note · July 3, 2026

AI should make people better thinkers, not just faster producers

The risk that keeps me up at night is not bad AI output. Bad output gets caught. The quieter risk is judgment erosion — people producing more while understanding less, until the day the tool is wrong and nobody in the loop can tell.

Two randomized controlled trials now bracket the outcome space with unusual precision. In a PNAS study by Bastani and colleagues, roughly a thousand high school students were given AI assistance for math practice. Students using a standard, answer-first interface scored 17 percent worse on unassisted exams than peers who had no AI at all — while a guardrailed version of the same underlying model, redesigned to make students reason through hints before receiving answers, largely avoided that harm. In a separate trial, Harvard physicists built a reasoning-first AI tutor and tested it against best-practice active learning — not passive lecturing, the current gold standard of human instruction. The AI-tutored students showed roughly double the learning gains, in less study time.

Same technology. Opposite outcomes. The variable was not the model — it was whether the interaction design required people to think before they received. That finding should unsettle anyone running an enterprise AI program, because almost every workplace deployment ships the answer-first design, and almost every training program teaches the mechanics of prompting rather than the discipline of review.

The habit vacuum

Most workplace AI habits were not taught; they were improvised under deadline pressure, optimized for the only visible goal — get the task done. People arrive with an entrenched default: ask, receive, paste. Enterprises trying to build AI capability are not writing on a blank slate. They are contending with a self-taught framework reverse-engineered from the tool's interface, and the research above shows exactly what that framework produces when the work turns judgment-intensive.

Review is a capability, not a checkbox

The standard corporate answer — "human in the loop" — is weak unless the loop has ownership and skill. A reviewer who cannot tell a sound AI-drafted analysis from a confident wrong one is not a control; they are latency. Building actual review capability means teaching three things the productivity curriculum skips: how to interrogate an output's assumptions, how to challenge it against independent evidence, and how to synthesize it with what the reviewer knows that the model does not. That is training design, and it determines whether the organization's judgment compounds or decays.

How I operationalize this

In my own portfolio work at Doosan GridTech and T-Mobile, the operating rule was consistent: AI proposes, named humans confirm. AI flagged schedule inconsistencies and critical-path risks ahead of governance cadences; it did not make the call. AI proposed reclassifications across hundreds of portfolio records; product managers confirmed, declined, or amended every one. And when I built AI enablement for practitioners, the curriculum was organized around where AI creates real value in delivery work versus where it introduces noise — concrete use cases, not tool features. The point was never to slow people down. Reviewers who interrogate output get faster at it, and they stay capable of working the day the tool fails.

The governance question most AI programs ask is how many people are using the tools. The better question is whether the people using them could still do the thinking without them — because that reserve of judgment is what the organization actually runs on when the outputs are wrong, and some of them will be.

A shorter version of this note circulates on LinkedIn. This is the complete version.