Confidence is not competence: what 698 participants reveal about the limits of self-judgment

The assumption that use sharpens judgment

A common belief in enterprise AI adoption is that people can tell how good they are with AI. Give them the tools, let them work, and the ones who need help will know it and ask. It is an appealing idea, and it is wrong. AI does improve output. What it does not do is improve people's ability to judge their own output — and in many cases it makes that judgment worse.

A new study published in Computers in Human Behavior ran two preregistered experiments (698 participants in total) and found that AI use lifts performance and inflates self-assessment at the same time. The people who used AI got better results and became markedly worse at knowing how well they had actually done. And our own enterprise data, from a different angle, shows the same thing.

What 698 participants show

Daniela Fernandes and colleagues (2026) had participants solve LSAT-style logical reasoning problems with or without an AI assistant, then estimated how well they had performed. The design lets you separate two things organizations usually conflate: how well someone does, and how well they think they did.

Three findings stand out.

AI raises performance and inflates self-judgment together. In Study 1 (N=246), AI use substantially improved task scores — and coincided with a large overestimation of performance. Participants did better and believed they had done far better still. The gap between actual and perceived competence widened precisely when the tool was helping most, because people read the quality of the AI's output as evidence of their own ability.

The "I'd know if I were bad at this" instinct breaks. Normally, the least skilled overestimate themselves the most — the Dunning-Kruger pattern. With AI in the loop, that gradient flattened and, in this sample, disappeared entirely (it was present in a comparable non-AI sample). When AI homogenizes output quality, even lower performers produce good-looking work and draw confidence from it. The practical implication is uncomfortable: you cannot assume your strongest people are well calibrated and your weakest people know they are struggling. Under AI, everyone drifts toward overconfidence.

AI literacy tracked with worse calibration, not better. Higher AI literacy correlated with more confidence but less accurate self-assessment — participants who understood AI well were more sure of themselves and less precise about their actual performance. And in Study 2 (N=452), paying participants to judge themselves accurately did not fix it; the pattern replicated. The people who "know AI," and the incentive most managers reach for, are not the safeguard they appear to be.

The paper's framing is in the title: AI makes you smarter but none the wiser. It improves what you produce without improving your ability to evaluate it.

What enterprise data confirms

We saw the same pattern from a different angle.

The AGASI GenAI Capability Pulse is a scenario-based assessment that measures what non-technical teams actually do with GenAI in realistic workplace situations. It tests judgment calls (verification, data handling, prompting, relevance), not self-reported confidence. Our sample (N=153) spans enterprise professionals across HR, Finance, Operations, Sales, and Strategy.

The findings converge with the experimental data:

Confidence does not track competence. Nearly 1 in 4 respondents (23.5%) are overconfident: high self-rated confidence, low scenario-based competence. They will never self-identify as needing help, because they believe they are already capable. (Full analysis)
Miscalibration has a cost. Overconfident users average 7x more verification errors and 5x more data handling errors than their genuinely capable peers (p < 0.0001). The miscalibration is not cosmetic — it maps directly onto the two error categories with the highest operational consequence. (Full analysis)
Usage frequency does not predict capability. Training does. Respondents with formal training scored 10.5 SJT points higher than untrained respondents (p=0.008). Usage frequency was not a significant predictor, and two-thirds of daily users have never been trained. (Full analysis)
The people who most need help least self-select it. Among respondents whose weakest area is data handling, 93% prefer training in other topics. Self-direction systematically routes support away from the people who need it. (Full analysis)

Why it matters

Experimental data and enterprise cross-sectional data converge on the same conclusion: self-assessment is not a reliable signal of AI capability, and it gets less reliable as people use AI more and learn more about it.

This breaks the most common enablement instinct — "the people who need help will ask." They will not, because they do not believe they need it. The Fernandes studies show why at the mechanism level: AI output quality masks the user's own competence, so the internal cue people rely on to gauge their skill is corrupted. The Pulse data shows what it costs at the workflow level: the overconfident quarter produces the most consequential errors, repeatedly, across every workflow they touch.

It also explains why two popular fixes underperform. Asking people how confident they feel surfaces the wrong people, because confidence is highest where calibration is worst. And betting on AI literacy as a safeguard backfires, because literacy raised confidence faster than it raised accuracy. The only thing that reliably separates capable users from confidently incapable ones is measuring what they actually do.

What to do about it

Don't ask how confident people are — measure what they do. Confidence surveys and voluntary training sign-ups will systematically miss the highest-risk group. Use scenario-based assessment like the GenAI Capability Pulse to measure decisions, not self-perception.
Don't treat AI literacy as a safeguard on its own. Knowing about AI raised confidence faster than accuracy. Pair any literacy effort with objective calibration so people see the gap between what they think they did and what they actually did.
Put verification and data handling where the errors are. Overconfident users dominate both categories. Add review checkpoints and safe-input rules at the points where AI output enters decisions or customer-facing work.
Build the checks into the workflow, not the willpower. Self-direction routes help away from the people who need it. Structured Playbooks with built-in verification steps and data handling rules catch errors that confident users would never flag themselves.

If you rely on people to tell you how good they are with AI, you will trust the wrong quarter of your workforce the most. Confidence is highest exactly where calibration is worst. Measure capability directly.

These findings synthesize external research (Fernandes et al., 2026, Computers in Human Behavior, 108779) with data from the GenAI Capability Pulse, a scenario-based assessment that measures what non-technical teams actually do with GenAI. If your organization is scaling AI adoption, start with a baseline.

Sources: Fernandes, D., Villa, S., Nicholls, S., Haavisto, O., Buschek, D., Schmidt, A., Kosch, T., Shen, C., & Welsch, R. (2026). AI makes you smarter but none the wiser: The disconnect between performance and metacognition. Computers in Human Behavior, 175, 108779. AGASI GenAI Capability Pulse (N=153).

Confidence is not competence: what 698 participants reveal about the limits of self-judgment

The assumption that use sharpens judgment

What 698 participants show

What enterprise data confirms

Why it matters

What to do about it

Related

The confidence trap: why self-assessment fails and what it costs

Nearly 1 in 4 GenAI users are confidently wrong