A particularly pronounced version of this can often be seen by letting 2 agents review and code in a loop. One agent will find some problems with the code, the other agent will address the review by adding more code.
A good human developer might see that the better way to address the review is to backtrack and pick a different approach. The ai agents seem more prone to getting stuck down bad branches of the decision tree.
Code reviews should be done by someone other than the author though, so the only thing that changes with ai generated code in that respect is the amount of it
Before: One person writes the code (and likely understands it thoroughly), another person reviews the code to spot obvious mistakes or shortcomings. Now: AI writes the code, a person reviews it to spot obvious mistakes or shortcomings.
In the before case, you have a person who has a deeper understanding of the code and in the AI case, you don’t, instead you have even more code to review.
When a competent programmer is writing the code, the human written code tends to be higher quality too. So it’s not just about review quantity but the quality of code being reviewed. Some people claim the AI writes great code, but that just hasn’t been my experience yet (at least with the models I’ve tried, including Opus). They still make ridiculously bad decisions regularly.
>When a competent programmer is writing the code, the human written code tends to be higher quality too
This is a great idea, but on average is deeply untrue. Far and away most programmers today write significantly worse code than LLMs. Also LLMs are fantastic at generating high level summaries and comments in code
> Far and away most programmers today write significantly worse code than LLMs
Your experience with LLMs do not match my own. Not to say that I haven’t experienced terrible human written code where I’ve wondered what the author could possibly have been thinking, but overall, I still find LLM written code to be on the poor side.
Like, the code itself is ok, but the wider picture reasoning and abstractions are bad. It also makes really dumb decisions far too often. Or doggedly shoehorns its first idea in no matter how badly it fits.
In relation to the client (AI Agent), the MCP server is serving resources like tools, but in relation to your platform that hosts the API those tools call, it is a client.
A good human developer might see that the better way to address the review is to backtrack and pick a different approach. The ai agents seem more prone to getting stuck down bad branches of the decision tree.