More

zozbot234 · 2026-04-01T14:00:28 1775052028

> Demonstrably working, as in you can prove the code actually works by then putting it to use.

That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case. You need actual proof that's driven by the code's overall structure. Humans do this at least informally when they code, AI's can't do that with any reliability, especially not for non-trivial projects (for reasons that are quite structural and hard to change) so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.

coldtea · 2026-04-01T14:06:32 1775052392

>That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case.

So? We didn't prove human code "isn't going to fail due to some obscure or unforessen corner case" either (aside the tiny niche of formal verification).

So from that aspect it's quite similar.

>so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.

You seem to imply they do some sort of random iteration until the tests pass, which is not the case. Usually they can see the test failing, and describe the issue exactly in the way a human programmer would, then fix it.

zozbot234 · 2026-04-01T14:11:05 1775052665

> describe the issue exactly in the way a human programmer would

Human programmers don't usually hallucinate things out of thin air, AIs like to do that a whole lot. So no, they aren't working the exact same way.

zozbot234 · 2026-04-01T13:37:33 1775050653

> be very picky about AI generated PRs: add tons of comments, slow down the merge, etc.

But that's the opposite of sabotage, you're actually helping your boss use AI effectively!

> spend tons of tokens on useless stuff at work (so your boss knows it’s not worth it)

Yes, but the "useless" stuff should be things like "carefully document how this codebase works" or "ruthlessly critique this 10k-lines AI slop pull request, and propose ways to improve it". So that you at least get something nice out of it long-term, even if it's "useless" to a clueless AI-pilled PHB.

zozbot234 · 2026-04-01T13:28:45 1775050125

LLM's don't really output the same code quality as a human, even on the smallest scale. It's not even close. Maybe you can guide them to refactor their slop up to human-written quality, but then you're still coding. You're just doing it by asking the computer to write something, instead of physically typing the whole thing out with a keyboard.

mcdeltat · 2026-04-01T13:43:32 1775051012

Yeah I also keep thinking this. I don't see LLMs reliably producing code that is up to my standards. Granted I have high standards because I do take pride in producing high quality code (in all manner of metrics). A lot of the time the code works, unfortunately only for the most naive, mechanical definition of "works".

phpnode · 2026-04-01T13:45:07 1775051107

This just isn’t true at all, with guidance and guard rails they produce much better code than the average developer does. They are only going to get better.

zozbot234 · 2026-04-01T13:24:18 1775049858

What do you mean by "LLM coding"? That's not a very meaningful term, it covers everything from 100% vibe coded projects, to using the LLM to gradually flesh out a careful initial design and then verifying that the implementation is done correctly at every step with meticulous human review and checking.

zozbot234 · 2026-04-01T08:06:37 1775030797

If it's just about skipping some buffer sync that's something that could also be adopted by llama.cpp's own Metal backend, at least on Apple Silicon platforms.

zozbot234 · 2026-03-31T23:00:13 1774998013

You can demonstrate "running" the latest open Kimi or GLM model on a top-of-the-line laptop at very low throughput (Kimi at 2 tok/s, which is slow when you account for thinking time) today, courtesy of Flash-MoE with SSD weights offload. That's not Opus-like, it's not an "average" laptop and it's not really usable for non-niche purposes due to the low throughput. But it's impressive in a way, and it does give a nice idea of what might be feasible down the line.

zozbot234 · 2026-03-31T22:22:04 1774995724

If they buy all the jet engines too (to turn them into gas turbines) they can be carbon neutral. More gas burned, but less jet fuel. It's a win-win.

zozbot234 · 2026-03-31T22:18:19 1774995499

So when my coworker tells me he got a "raise", they're not talking about money that will end up in their bank account?

zzrrt · 2026-03-31T23:37:03 1775000223

It's a different definition of the word for one thing, and anyway, unless their compensation is prepaid this would only suggest that "raise" doesn't mean liquid money in an account, because an employee's raise is a promise to pay an amount over the remainder of the year with the stipulation the employee continues at the job.

zozbot234 · 2026-03-31T22:11:59 1774995119

If your laptop overheats when you push your GPU, you can buy purpose-built "gaming" laptops that are at least nominally intended to sustain those workloads with much better cooling. Of course, running your inference on a homelab platform deployed for that purpose, without the thermal constraints of a laptop, is also possible.

Aurornis · 2026-03-31T22:40:18 1774996818

I didn't say it overheats. It gets hot and the fans blow, neither of which are enjoyable.

MacBook Pro laptops are preferred over "gaming" laptops for LLM use because they have large unified memory with high bandwidth. No gaming laptop can give you as much high-bandwidth LLM memory as a MacBook Pro or an AMD Strix Halo integrated system. The discrete gaming GPUs are optimized for gaming with relatively smaller VRAM.

zozbot234 · 2026-03-31T22:05:16 1774994716

The big AI firms are all heavily compute-constrained, so that shouldn't be much of a surprise.