- Developer trust in AI output is declining. Over 75% of developers still want human validation when they don’t trust AI answers.
- Debugging AI-generated code takes more time than expected, with “almost right but not quite” solutions being the top frustration.
- Advanced questions on Stack Overflow doubled since 2023, indicating that LLMs may struggle with complex reasoning problems.
- Agentic AI adoption is split: More than half of developers are still sticking to simpler AI tools, but 70% of adopters report reduced time on tasks thanks to agentic workflows.
- Small language models and MCP servers are emerging as cost-effective solutions for enterprise and domain-specific tasks.
The 2025 Stack Overflow Developer Survey gives us a nuanced look at AI adoption among enterprise development teams. AI tools are widely used, but as adoption rises and developers bump into the real-world limits of their shiny new tools, trust declines accordingly. At the same time, the survey underscores the value developers continue to place on human knowledge and experience, especially as AI tools become more unavoidable.
On a recent episode of Leaders of Code, Stack Overflow Senior Product Advertising and marketing Supervisor Natalie Rotnov highlighted what enterprises ought to take away from these findings, particularly round AI adoption and implementation. Right here, we’ve distilled Natalie’s tackle the survey findings, laid out some motion objects for management, and dug somewhat deeper into her suggestions round agentic AI for the enterprise. Spoiler alert: All of it comes again to information high quality.
Stack Overflow’s 2025 survey of almost 50,000 builders around the globe revealed that developer belief in AI instruments is declining. This most likely doesn’t shock any builders on the market, but it surely would possibly come as a shock to the C-suite in the event that they’ve been bullish on AI instruments however not essentially attuned to how their groups work.
In line with Rotnov, builders’ skepticism of AI is wholesome. “Builders are skeptics by commerce,” she explains. “They should be important thinkers, and so they’re on the entrance traces intimately accustomed to the nuances of coding, debugging, and problem-solving.” Aren’t these precisely the folks you need working with brand-new AI coding instruments?
The survey recognized builders’ largest frustrations with AI:
- “Virtually proper, however not fairly” options. AI produces code that seems appropriate however comprises refined errors. These create pitfalls, particularly for much less seasoned builders, who might not have the expertise to establish and proper these points.
- Time-consuming debugging. Fixing AI-generated code usually takes longer than anticipated, particularly with out correct context.
- Lack of advanced reasoning. Present AI fashions wrestle with superior problem-solving and higher-order work.
These issues align with analysis findings. Research from Apple means that LLMs primarily have interaction in sample matching and memorization reasonably than true reasoning. The paper confirmed that as duties grew extra advanced, mannequin efficiency deteriorated—proof that reasoning fashions are nonetheless comparatively immature.
Key time period: Reasoning fashions are AI fashions designed to interrupt down issues and assume by means of options step-by-step, mimicking human cognitive processes. OpenAI’s o1 is one instance.
Regardless of AI’s consistently increasing capabilities, our survey revealed that human data nonetheless ranks supreme in relation to sophisticated technical issues. Greater than 80% of builders nonetheless go to Stack Overflow frequently, whereas 75% flip to a different individual once they do not belief AI-generated solutions.
Much more telling: Regardless of builders tinkering with reasoning models, superior questions on Stack Overflow.com have doubled since 2023. Stack Overflow’s father or mother firm, Prosus, makes use of an LLM to categorize questions as “primary” or “superior.” The dramatic enhance in questions tagged “superior” means that builders are encountering issues AI instruments can’t assist them with.
Rotnov emphasizes two necessary conclusions that enterprises ought to draw from this information:
- LLMs have not mastered advanced reasoning issues. As a substitute, builders flip to human-centered data communities for assist.
- AI is creating new issues that communities have by no means encountered earlier than.
Not solely are human experience and validation nonetheless important, then, however the brand new issues cropping up due to AI use, misuse, or overuse require human-driven options.
Instance: A developer utilizing an AI coding assistant would possibly generate a working utility rapidly, however when they should optimize efficiency, deal with edge circumstances, or combine with legacy programs, they require human experience and collaborative problem-solving.
Rotnov outlined two high-level motion objects enterprise leaders can take to make their AI tasks profitable whereas supporting technical groups’ most popular instruments and workflows: investing in areas for data curation/validation and doubling down on retrieval augmented era (RAG).
What to do: Create inside platforms the place builders can doc, focus on, and validate new issues and options rising from AI-assisted workflows.
Why it issues: As AI modifications how builders work, they want structured areas to construct consensus round new patterns and finest practices.
Greatest practices:
- Select platforms that help structured codecs with metadata (tags, classes, labels).
- Implement high quality alerts like voting, accepted solutions, and knowledgeable verification.
- Make sure the format is AI-friendly so this data can feed again into your inside LLMs and brokers.
Key time period: Metadata refers to details about information (like tags, classes, or timestamps) that helps set up and contextualize content material, making it simpler for each people and AI programs to know and retrieve related info.
RAG (retrieval augmented era) continues to be “having a second,” Rotnov says, and for good motive. The survey confirmed:
- 36% {of professional} builders are studying RAG.
- Trying to find solutions is the place AI adoption is highest in improvement workflows.
- The “RAG” tag has turn into one of the widespread new tags on Stack Overflow.
What RAG does: RAG programs summarize inside data sources into concise, related solutions that floor wherever builders work, whether or not that’s inside IDEs, chat platforms, or documentation.
Essential consideration: RAG is barely pretty much as good because the underlying information. If you happen to’re summarizing poorly structured or outdated info, you may get poor outcomes.
Instance: A developer troubleshooting a deployment challenge might question an inside RAG system that pulls from documentation, previous incident stories, and crew wikis to offer a complete reply with out manually looking a number of sources.
For organizations constructing their very own fashions (whether or not inside instruments or merchandise), Rotnov emphasizes two priorities: bettering reasoning capabilities and implementing human validation loops.
The problem: Present reasoning fashions are immature and wrestle with advanced duties.
The answer: Prepare fashions on information that demonstrates human thought processes, not simply ultimate solutions.
Vital information sorts embrace:
- Remark threads displaying how people focus on and consider options.
- Curated data that reveals how understanding evolves over time.
- Choice-making processes that expose the “why” behind conclusions.
Survey perception: For the primary time, Stack Overflow requested how folks use the platform. The #1 reply? They have a look at feedback. This reveals that builders are on the lookout for extra than simply the accepted answer. They need to see the dialogue, the related context, and the varied views surrounding a query.
The problem: Mannequin drift: when AI outputs turn into much less correct as real-world circumstances change.
The repair: Construct steady suggestions mechanisms the place people consider and proper AI outputs to make sure accuracy and alignment with human values.
Instance: Stack Overflow is piloting integrations the place AI fashions seem on leaderboards and customers can vote on responses from completely different fashions, offering real-time suggestions on efficiency.
This is a stunning discovering: Over a 3rd of builders use 6-10 completely different instruments in the middle of their work, however opposite to widespread assumptions, device sprawl would not correlate with job dissatisfaction.
“It stunned me as a result of everybody’s been making an attempt to resolve this device sprawl downside for years,” Rotnov notes. “But it surely looks like builders settle for that every device serves a particular use case and that they want them to do their job.” In deciding which AI instruments and applied sciences to put money into, enterprises ought to keep in mind that builders can tolerate a good quantity of device sprawl—so long as each is serving a definite operate inside their workflows.
And talking of workflows… Agentic AI refers to autonomous programs that may carry out advanced duties throughout a number of instruments and platforms to attain particular targets with out fixed human steering. In idea, agentic AI guarantees to resolve device sprawl. However adoption of agentic AI programs is proscribed:
- 52% of builders both do not use brokers or follow less complicated AI instruments.
- Safety and privateness issues stay vital limitations to agent adoption.
- Reasoning mannequin immaturity limits brokers’ capabilities.
Nevertheless, amongst builders who’ve began utilizing agentic AI of their workflows, the outcomes are promising:
- 70% report that brokers decreased the time they spent on particular duties.
- 69% agree that brokers elevated their productiveness.
- Youthful/much less skilled builders usually tend to undertake brokers.
As we’ve seen with the adoption curve of AI instruments typically, builders will embrace agentic workflows once they see proof constructive that these programs work.
On that word, Rotnov had some suggestions for enterprises rolling out agentic AI programs.
As with every new device or expertise, Rotnov recommends that enterprises pilot low-risk agentic use circumstances earlier than rolling out broader implementations. Reveal worth, construct consensus, after which roll it out to extra customers when you perceive how issues work on a micro scale.
Think about piloting with interns or newer builders on onboarding duties, the place errors have decrease penalties and suggestions loops are clear.
MCP (mannequin context protocol) is a standardized approach for LLMs to entry and be taught from information sources. It’s analogous to the Worldwide Picture Interoperability Framework (IIIF), which standardizes how photographs are delivered and described over the net.
What MCP servers do:
- Assist AI be taught implicit data: your group’s language, tradition, and approach of working.
- Allow sooner familiarization with inside programs.
- Present read-write entry and pre-built prompts for dynamic data sharing.
- Connect with current AI instruments and brokers for much less context switching.
Actual-world utility: Stack Overflow lately launched a bi-directional MCP server. A developer constructing an inside app in Cursor can connect with the MCP server and instantly entry enterprise data—full with construction, high quality alerts (votes, accepted solutions), and metadata (tags) to tell their utility’s outputs.
Why the pattern: Small language fashions (SLMs) are gaining recognition as a result of they’re:
- Activity-specific: Smaller fashions may be fine-tuned for specific domains or use circumstances.
- Value-effective: As you’d count on, small fashions are cheaper to construct and preserve than massive fashions.
- Higher for the surroundings: Unsurprisingly, SLMs require much less computational energy.
- Very best for brokers: Smaller fashions are perfect for specialised agentic duties.
Instance: A healthcare firm would possibly deploy an SLM particularly educated on medical coding requirements and their inside protocols for processing insurance coverage claims, reasonably than counting on a general-purpose LLM.
Whereas MCP servers and brokers get consideration, APIs stay essential for decreasing context switching and the general cognitive load on builders. In actual fact, developers are more likely to endorse and turn into followers of a expertise if it has an easy-to-use and strong API.
What to judge:
- Is the API well-documented and supported?
- Does it use a REST structure or different AI-friendly format?
- Is pricing clear?
- Is there an SDK obtainable for simpler integration?
Instance: Stack Overflow lately launched a TypeScript SDK for Stack Internal, making it simpler for builders to construct integrations and customized workflows.
Rotnov was very clear concerning the number-one suggestion she has for enterprises considering AI tasks:
“You actually have to be trying lengthy and arduous about what inside information sources you’ve gotten that LLMs and AI can be taught from and supply correct solutions to your groups.”
Key inquiries to ask:
- Are you giving builders areas to create new data and problem-solve collaboratively?
- Is that data well-structured with good metadata and high quality alerts?
- If you happen to’re utilizing third-party information, does it meet the identical high quality standards?
- Is your information conducive to AI, i.e., organized in ways in which LLMs can successfully be taught from?
It doesn’t matter what you are constructing—agentic programs, RAG implementations, or customized fashions—the underlying information high quality determines success. Even artificial information era requires high-quality supply materials.
For his or her AI initiatives to succeed, enterprises should steadiness the productive potential of AI instruments towards the necessity for steady human validation and community-driven data infrastructure. Thriving builders aren’t utilizing AI to interchange human judgement or stand in for human expertise. They’re utilizing them as pressure magnifiers. In the identical approach, thriving enterprises are combining AI capabilities with human experience, leveraging well-structured data programs and considerate implementation methods to ensure AI provides worth at each stage of the enterprise.

