Access is not capability: what 12,000 users reveal about the limits of learning by doing

The assumption that use builds capability

A common belief in enterprise AI adoption is that capability develops through use. Give people access, let them experiment, and competence will follow. It is an appealing idea, and it is wrong. Access gets people in the door and adoption gets them using the tool, but neither guarantees they get better at it.

A new study from Cornell and Microsoft Research tracked roughly 12,000 Bing Copilot users over six months and found that individual habits are overwhelmingly sticky. Users did not meaningfully change how they interacted with AI over time. And our own enterprise data, from a different angle, shows the same thing.

What 12,000 users show

Rebecca Hicke and Kiran Tomlinson (2026) analyzed the full conversational trajectories of randomly sampled Copilot users active during 2024, measuring prompting complexity, task types, completion rates, and conversational domains across user lifetimes.

Three findings stand out.

Habits are overwhelmingly sticky. Individual users barely changed their behavior over six months. They used the same prompt patterns, pursued the same task types, and achieved similar completion rates from their first conversation to their last. What looks like population-level learning (the aggregate trending toward more complex tasks and higher completion) is actually driven by new users who arrive already different, not by existing users adapting over time.

Power users differ from day one. High-activity users do not learn to become power users through practice. From their very first interactions, they write more complex prompts, attempt more professional and creative tasks, and complete those tasks at higher rates. The gap is present from the start, not built over time.

Users narrow, they do not explore. Over the course of their trajectories, users slightly reduced the variety of task types and conversational domains they attempted. They settle into routines rather than discovering new applications through natural exploration.

The paper's conclusion: "the stickiness of habits means that users may not discover more useful and successful LLM tasks through natural exploration, indicating a need for proactive interventions."

What enterprise data confirms

We saw the same pattern from a different angle.

The AGASI GenAI Capability Pulse is a scenario-based assessment that measures what non-technical teams actually do with GenAI in realistic workplace situations. It tests judgment calls (verification, data handling, prompting, relevance), not self-reported confidence. Our sample (N=153) spans enterprise professionals across HR, Finance, Operations, Sales, and Strategy.

The findings converge with the longitudinal data:

Usage frequency does not predict capability. Training does. Respondents with formal training scored 10.5 SJT points higher than untrained respondents (p=0.008). Usage frequency was not a significant predictor. Two-thirds of daily users have never been trained, and their capability scores show it. (Full analysis)
Self-assessment fails. Nearly 1 in 4 users are confidently wrong: high self-rated confidence, low scenario-based competence. They will never self-identify as needing help. (Full analysis)
Failure modes persist without intervention. Verification and data handling errors account for over half of all mistakes. Among respondents whose weakest area is data handling, 93% prefer training in other topics. The people who most need it never self-select that training. (Full analysis)
Capability is flat across roles, but failure modes cluster. No function is meaningfully ahead. The differences are in how people fail. Verification errors concentrate in some roles, data handling errors in others. (Full analysis)

Why it matters

Consumer-scale longitudinal data and enterprise cross-sectional data converge on the same conclusion: use does not produce adaptation. The two measure different things — the paper tracks how behavior changes over time, the Pulse measures capability at a point in time — but they point the same direction: use alone moves neither.

Hicke and Tomlinson show this over time: six months of use does not change how people interact with LLMs, even among the most active returning users. The Pulse data shows it in the cross-section: frequent use without training amplifies the same errors across every workflow. The paper demonstrates that what looks like organizational learning at the population level is often a compositional effect: new users arriving with different habits, not existing users improving.

This has a direct implication for any enterprise tracking "AI maturity" through adoption dashboards. Rising login counts and query volumes might be measuring turnover and new hires, not capability development. The only way to know is to measure capability directly.

And the finding that users narrow their use cases over time, rather than discovering new applications through exploration, helps explain why prompt libraries and tip sheets rarely change behavior. Users already know what they use AI for. What they do not do is discover what else it could do.

What to do about it

Stop waiting for natural learning: If 12,000 users over six months barely changed their behavior, "let them figure it out" is not an enablement strategy. Proactive, structured intervention is the only thing that moves the needle.
Measure capability, not adoption: Adoption metrics track volume. They do not tell you whether anyone is getting better. Use scenario-based assessment like the GenAI Capability Pulse to measure what your teams actually do with AI, not how often they log in.
Invest in structured training, not just access: Organizational training delivers a measurable 10.5-point capability uplift. Self-directed learning alone does not. Gate tool provisioning behind baseline training on verification and data handling.
Build structured workflows, not just prompt libraries: Users settle into narrow routines and do not discover new applications on their own. Structured Playbooks that map AI to specific tasks, with verification checks and data handling rules built in, are how new applications get adopted.

Access is necessary. But access alone is not enough. If your organization is scaling GenAI, measure capability directly and invest in the structured training and workflows that actually change behavior.

These findings synthesize external research (Hicke & Tomlinson, 2026, arXiv:2605.29018) with data from the GenAI Capability Pulse, a scenario-based assessment that measures what non-technical teams actually do with GenAI. If your organization is scaling AI adoption, start with a baseline.

Sources: Hicke, R. M. M. & Tomlinson, K. (2026). Adopt ≠ Adapt: Longitudinal Analyses of LLM Conversations in the Wild. arXiv:2605.29018. AGASI GenAI Capability Pulse (N=153).

Access is not capability: what 12,000 users reveal about the limits of learning by doing

The assumption that use builds capability

What 12,000 users show

What enterprise data confirms

Why it matters

What to do about it

Related

What every team needs before scaling GenAI

Why first drafts need structure before GenAI helps