This list started as an idea for a short LinkedIn post, but ended as a summary of systemic problems that need to be addressed to make coding LLMs/AI Agents a paradigm shift in software engineering—not a dead end that creates as many problems as it solves. Perhaps this attempt to organize my own thoughts on the topic will be of some use to anyone.
Misunderstanding of intelligence: LLM-based agents do not reason in the human sense of the word; but are very advanced prediction and pattern recognition engines—which defines their unavoidable limitations inherent to the transformer architecture. Contrary to the marketing, they are not “intelligent” - defined here as the ability to generalize, abstract, and establish causal relationships between facts. They just simulate this process very convincingly using language as a medium - being a “Stochastic parrot”1, constrained by both training and syntax to the benefit of prediction accuracy - especially in the context of programming.
Misunderstanding of LLM providers’ incentives: A big part of the current hype is effectively marketing on behalf of AI companies’ shareholders. While alternative narratives are becoming louder, we are still far from the moment when the non-technical crowd will recognize blatant lies (LLMs are “intelligent”), scare tactics raising the profile of LLMs (“AI Agents with agency”), or publicity stunts like the Anthropic C compiler2, which the industry recognized as proof of limitations, not capability, yet was sold as a breakthrough.
Misunderstanding of cost structure: Writing new software is cheaper than maintaining it and keeping it running. LLMs allow for faster creation of software/features at the cost of either accelerated codebase rot with all associated costs or greatly increased effort to maintain engineering discipline—something most companies are notoriously bad at, while incentivising “silver-bullets” instead of long-term solutions.
Cognitive debt: The cost of repeated onboarding of software developers onto the same—yet every time unfamiliar—codebase, every time a human touch is needed. Developers lose familiarity with the code they are working in—its quirks, shortcuts, and workarounds. Whether this results in more space for learning the business domain remains to be seen, but a cynical yet realistic expectation is that more will be done—not necessarily smarter.
Cognitive overload: Dropping quality of human-in-the-loop control over the process, because the volume of changes that require attention (see pt. 1) is becoming overwhelming. Illustrated by the “LGTM effect”3 and subsequent production issues that recently resulted in Amazon mandating senior-level review for all LLM output4.
Automation complacency: The natural tendency of every human to put 100% trust in systems that are correct majority of the time - but not always (see pts. 4 and 5). Illustrated well by car accidents involving misleadingly named “Full Self-Driving”5 vehicles, which bears a lot of similarity to production issues involving the misleadingly named “Artificial Intelligence”.
Competency gap & cost: Engineers typically working on low-level problems will have to suddenly gain a bird’s-eye perspective on software that they were often denied in day-to-day work. This, or we will end up with a shortage of pragmatic yet architect-minded individuals, with costs associated with unfulfilled demand. Especially if the junior->mid->senior->staff->principal pipeline will remain mostly broken for the foreseeable future.
Quality requirements: To employ agents effectively to write production-grade code, a high-quality codebase with strong guardrails providing feedback to agents is a must. This might be problematic, as it was not an industry priority before and should not be expected to become a priority now. Silicon-based agents require more clarity and structure than protein-based ones to be effective, not less. Yet, organizations lacking in both departments seem to put the most faith in the Agentic AI promised land—based purely on short-term KPIs.
Alignment tradeoff: The promise of LLMs in software is a promise of a less formal way of expressing requirements and processes than an unambiguous programming language itself. The more aligned LLMs are with this expectation, the more gaps they need to fill themselves—often leading to non-deterministic and unpredictable results (see pt. 1). On the other hand, the more formal they become when processing human input, the more of the promised productivity gains we lose.
Groupthink: By definition, LLMs gravitate towards solutions overrepresented in training data. This might be a boon, making experience more transferable as it gets reflected in more and more codebases. But it also might become an industry curse, when problems that might benefit from using a screwdriver will always be resolved using a hammer. A tendency that might get reinforced with each new generation/training data refresh of LLMs and the inertia of “the new normal”.
Model collapse: It is probably unavoidable for coding LLMs to at some point hit the wall of inbreeding, because of the model collapse6 phenomenon leading to worsening training outcomes as more LLM-generated data enters the training set (see pt. 1). The more “AI”-written code exists in the wild without clear indication of its origin, the harder it will be to find and select data offering clear signal instead of noise. We already lack enough quality data, while the well we have is quickly poisoned as we speak and the usable synthetic one is gated behind (at this moment) the pipe-dream of AGI (again, see pt. 1).
Vendor lock-in & enshittification: Adopting Agentic AI as the main modality of producing software might result in deep vendor lock-in, not because of the models themselves—there are good reasons to believe that they will be somewhat interchangeable—but because of all the tooling required to make them work at enterprise scale. This mandates enshittification7 of the whole space when sold solutions are already deeply rooted. While service SLA can be quantified and contractually enforced, the “reasoning effort” and model routing might not be constant but variable and hard to enforce. The typical “winner takes most” outcome for tech also begs the question: how will pricing, currently subsidized to drive adoption, change with growing concentration and maturity of this space?
“Stochastic parrot” is a metaphor describing Large Language Models (LLMs) as systems that statistically parrot training data without understanding meaning, coined by Bender et al. in 2021. ↩︎
In February 2026, Anthropic claimed that 16 Claude Opus 4.6 agents built a C compiler from scratch for $20k. The stated cost did not include weeks of work by Anthropic staff; the compiler was nowhere close to being production-ready, relied heavily on GCC for the missing assembler and linker, and used GCC as a feedback source for the agents. The main point of critique was that a partial and flawed solution to one of the best-documented problems in computer science history highlighted problems rather than strengths in the context of imitative work. ↩︎
“Looks Good To Me” - a standard code review approval acronym. In the context of AI, it refers to the tendency of reviewers to approve plausible-looking but subtly broken AI-generated code due to review fatigue or over-trust. ↩︎
In March 2026, Amazon mandated senior engineer approval for AI-generated code following a series of outages caused by AI tools, including a massive disruption on March 5th that resulted in an estimated 6.3 million lost orders. ↩︎
Tesla’s advanced driver-assistance system, frequently criticized for its misleading name implying full autonomy while still requiring active driver supervision. ↩︎
Refers to the 2024 Nature paper “AI models collapse when trained on recursively generated data” by Shumailov et al., demonstrating that models trained on AI-generated data irreversibly degrade in quality and variance. ↩︎
Term coined by Cory Doctorow in 2022 describing the platform decay cycle: first capturing users with high value, then abusing them to capture business customers, and finally abusing both to capture all value for shareholders. ↩︎