More

GoatInGrey · 2026-04-08T15:51:10 1775663470

March 2025, Anthropic was claiming that 90% of code would be written by LLMs in three to six months, and "essentially all" code within twelve months. This was one week after closing a Series E round for $3.5 billion. When they began working on their Series F round for $13 billion. You shouldn't need more than that to understand what's going on here.

The Claude Code leak revealed that Anthropic runs Claude-operated bots on the internet. One should be very cautious in getting swept up in the fund-raising process if they are not seeing first-hand the fruition of all of the flattering claims being presented by strangers on the internet.

supern0va · 2026-04-08T18:23:42 1775672622

>March 2025, Anthropic was claiming that 90% of code would be written by LLMs in three to six months, and "essentially all" code within twelve months.

There's a pretty big difference between "We predict in X time frame our model will be capable of Y" and "Our model did Y."

This is like watching someone measure the size of an object and saying "I don't believe you because you guessed it was X before you pulled out your tape measure."

WithinReason · 2026-04-08T17:46:02 1775670362

You're talking about marketing predictions and I'm talking about data presented in a whitepaper. They are not the same thing.

GoatInGrey · 2026-04-08T15:34:10 1775662450

The irony here is that even if one is extracting legitimate value from LLMs because they are that much smarter than their peers, the process of using LLMs to perform all of their skilled labor makes them less intelligent.

GoatInGrey · 2026-04-01T15:46:08 1775058368

https://x.com/trq212/status/2014051501786931427

" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "

xendo · 2026-04-01T16:04:42 1775059482

You can argue that any UI is like a game engine in that sense. Some make sensible choices and don't need to pretend they have to render at 60fps.

throwatdem12311 · 2026-04-01T22:29:08 1775082548

60fps is pathetic for a TUI when most terminals worth their salt are GPU accelerated and displays can be up to 240fps or even more. But let’s be real if I can play Quake at >500 fps they have no excuse.

thinkling · 2026-04-01T19:51:17 1775073077

Do they reconstruct the scene graph for each frame?! Maybe I'm overinterpreting the phrasing. Someone take a peek at the source?

joquarky · 2026-04-02T03:43:34 1775101414

Not reliably.

GoatInGrey · 2026-03-03T22:55:08 1772578508

To be fair, Claude Code is vibe-coded. It's a terrible piece of software from an engineering (and often usability) standpoint, and the problems run deeper than just the choice of JavaScript. But it is good enough for people to get what they want out of it.

bunderbunder · 2026-03-03T23:08:27 1772579307

But also, based on what I have heard of their headcount, they are not necessarily saving any money by vibecoding it - it seems like their productivity per programmer is still well within the historical range.

That isn’t necessarily a hit against them - they make an LLM coding tool and they should absolutely be dogfooding it as hard as they can. They need to be the ones to figure out how to achieve this sought-after productivity boost. But so far it seems to me like AI coding is more similar to past trends in industry practice (OOP, Scrum, TDD, whatever) than it is different in the only way that’s ever been particularly noteworthy to me: it massively changes where people spend their time, without necessarily living up to the hype about how much gets done in that time.

jmathai · 2026-03-04T03:01:10 1772593270

> But it is good enough for people to get what they want out of it.

This is the ONLY point of software unless you’re doing it for fun.

GoatInGrey · 2026-02-16T18:07:21 1771265241

Me:

I want to get my car's transmission fixed, the mechanic shop is 50 meters away. Would it be better to walk or to drive to a location 50 meters away from my front door?

-

Opus 4.6:

<thinking> The user is asking whether to walk or drive 50 meters to a mechanic shop, but there's a key nuance here - their transmission needs fixing, so driving might not be wise.

Done </thinking>

Walk, absolutely. Driving a car with a compromised transmission even 50 meters risks making the problem worse and turning a repair into a more expensive one. Plus, 50 meters is barely a minute on foot.

GoatInGrey · 2026-02-16T17:25:12 1771262712

No, what the other commenter described is narrowly scoped delegation to LLMs paired with manual review (which sounds dreadfully soul-sucking to me), not wholesale "write feature X, write the unit tests, and review the implementation for me". The latter is vibe-coding.

krastanov · 2026-02-16T17:53:12 1771264392

Reviewing a quick translation of a test to a benchmark (or another menial coding tasks) is way less soul-sucking than doing the menial coding by yourself. Boring soul-sucking tasks are an important thankless part of OSS maintenance.

I concur it is different from what you call vibecoding.

unshavedyak · 2026-02-16T17:50:24 1771264224

Sidenote, i do that frequently. I also do varying levels of review, ie more/less vibe[1]. It is soul sucking to me.

Despite being soul sucking, I do it because A: It lets me achieve goals despite lacking energy/time for projects that don't require the level of commitment or care that i provide professionally. B: it reduces how much RSI i experience. Typing is a serious concern for me these days.

To mitigate the soul sucking i've been side projecting better review tools. Which frankly i could use for work anyway, as reviewing PRs from humans could be better too. Also inline with review tools, i think a lot of soul sucking is having to provide specificity, so i hope to be able to integrate LLMs into the review tool and speak more naturally to it. Eg i belive some IDEs (vscode? no idea) can let Claude/etc see the cursor, so you can say "this code looks incorrect" without needing to be extremely specific. A suite of tooling that improves this code sharing to Claude/etc would also reduce the inane specificity that seems to be required to make LLMs even remotely reliable for me.

[1]: though we don't seem to have a term for varying amounts of vibe. Some people consider vibe to be 100% complete ignorance of the architecture/code being built. In which case imo nothing i do is vibe, which is absurd to me but i digress.

GoatInGrey · 2026-02-16T17:20:10 1771262410

Even on codebases within the half-year age group, these LLMs often do perform nasty (read: ungodly verbose) implementations that become a maintainability nightmare. Even for the LLMs that wrote it all in the first place. I know this because we've had a steady trickle of clients and prospects expressing "challenges around maintainability and scalability" as they move toward "production readiness". Of course, asking if we can implement "better performing coding agents". As if improved harnessing or similar guardrails can solve what is in my view, a deeper problem.

The practical and opportunistic response is too tell them "Tough cookies" and watch the problems steadily compound into more lucrative revenue opportunities for us. I really have no remorse for these people. Because half of them were explicitly warned against this approach upfront but were psychologically incapable of adjusting expectations or delaying LLM deployment until the technology proved itself. If you've ever had your professional opinion dismissed by the same people regarding you as the SME, you understand my pain.

I suppose I'm just venting now. While we are now extracting money from the dumbassery, the client entitlement and management of their emotions that often comes with putting out these fires never makes for a good time.

buschleague · 2026-02-16T20:12:37 1771272757

This is exactly why enforcement needs to be architectural. The "challenges around maintainability and scalability" your clients hit exist because their AI workflows had zero structural constraints. The output quality problem isn't the model, it's the lack of workflow infrastructure around it.

datsci_est_2015 · 2026-02-16T20:44:46 1771274686

Is this not just “build a better prompt” in more words?

At what point do we realize that the best way to prompt is with formal language? I.e. a programming language?

semiquaver · 2026-02-16T21:01:23 1771275683

No, the suite of linters, test suite and documentation in your codebase cannot be equated to “a better prompt” except in the sense that all feedback of any kind is part of what the model uses to make decisions about how to act.

datsci_est_2015 · 2026-02-16T21:25:34 1771277134

A properly set up and maintained codebase is the core duty of a software engineer. Sounds like the great-grandparent comment’s client needed a software engineer.

oblio · 2026-02-16T23:34:18 1771284858

What if LLMs, at the end of the day are machines, so for now generally dumber than humans and the best they can provide are at most statistically median implementantions (and if 80% of code out there is crap, the median will be low)?

Now that's a scary thought that basically goes against "1 trillion dollars can't be wrong".

Now, LLMs are probably great range extenders, but they're not wonder weapons.

lossyalgo · 2026-02-17T00:56:47 1771289807

Also who is to say what is actually crap? Writing great code is completely dependent on context. An AI could exclusively be trained on the most beautiful and clean code in the world, yet if it chooses the wrong paradigm in the wrong context, it doesn't matter how beautiful that code is - it's still gonna be totally broken code.

GoatInGrey · 2026-02-15T21:33:35 1771191215

Functioning antitrust is really all it takes. The last significant US antitrust action was on AT&T in 1982.

GoatInGrey · 2026-02-15T21:11:06 1771189866

For vibe-writing, the vibes aren't even that good!

GoatInGrey · 2026-02-13T23:43:49 1771026229

It's the evergreen tradeoff between the short and long terms. Do I get the nugget of information I need right now but lose in a month, or do I spend the time and energy that leads to deeper understanding and years-long retention of the knowledge?

There is something about our biology that makes us learn better when we struggle. There are many concepts on this dynamic: generation effect, testing effect, spacing effect, desirable difficulties, productive failure...it all converges on the same phenomenon where the easier it is to learn, the worse we learn.

Take K-12 for instance. As computing technology is further and further integrated into education, cognitive performance decreases in a near-linear relationship. Gen Z is famously the first generation to perform worse in every cognitive measure than previous generations, for as long as we've been recording since the 19th century. An uncomfortable truth emerging from studies on electronics usage in schools is that it isn't just the phones driving this. It's more so the Duolingo effect of software overall emulating the sensation of learning without actually changing the brain state. Because the software that actually challenges you is not as engaging or enjoyable.

How you learn, and your ability to parse, infer, and derive meaning from large bodies of information, is increasingly a differentiator in both the personal and professional worlds. It's even more so the case when many of your peers are now learning through LLM-generated summaries averaging just 300 words, perhaps skimming outputs around 1,000 words in length for "important information". The immediate benefits are obvious, but the cost of outsourcing that cognitive work gets lost in the convenience.

Because remember, this isn't just about your ability to recall specific regex, follow a syntax convention, or how much code you ship in an hour. Your brain needs exercise, and deep learning is one of the most reliable ways to get it. Doubly true if you're not even writing your own class names.

What I am speaking to is not far away or hypothetical, either. Because as of 2023, one in four young adults in the United States is functionally illiterate.

https://www.the74million.org/article/many-young-adults-barel...

zozbot234 · 2026-02-14T00:25:40 1771028740

Effective learning and memorizing is actually at the narrow edge of struggling: it's neither "too easy" nor "too hard and painful". SRS systems do a very good job of tuning this: by the time a question comes back to you it will feel difficult, but you'll be able to recall the information and answer it with some effort. It's a matter of recognizing this feeling and acknowledging as "the right kind of effort" as opposed to a hopeless task.

If you ask the AI "please quiz me about the proper understanding of issues x y z and tell me if I got it all right. iterate for anything I get seriously wrong, then provide a summary at the end and generate SRS cards for me to train on" it will generally do a remarkably good job at that.

r0b05 · 2026-02-14T16:01:45 1771084905

I agree with all of this. The brain needs exercise, just like the body.