More

anuramat · 2026-04-25T16:56:58 1777136218

"almost as good as opus at writing python/js/... when given a spec" might be enough for a lot of people, especially if its 10x cheaper

anuramat · 2026-04-25T16:49:04 1777135744

been thinking the same, but I imagine you could explicitly separate notes and slop, eg something as simple as cron job that goes through all your notes and creates a PR if there's some easy win: typos, inconsistencies, tags, etc

I've been coding like this lately: if I'm too lazy to review a new non-critical section/unit tests, I'll mark it as `// SLOP`; later, if I have to, I'll go through the entire thing, and unmark

shitty tests are better than no tests, as long as you your expectations are low enough

anuramat · 2026-04-25T16:04:38 1777133078

> LLMs don't read

so it's ok to say "SSD read/write speed", but now that we have something closer to the original meaning of the word, someone always has to point out that "LLMs don't have a soul" (or whatever you think is required for it to count as akchyually reading)

do storage devices have souls?

beng-nl · 2026-04-25T18:17:36 1777141056

If I can just stand up for the nitpicker - arguably in the uncanny valley it’s more natural to point out it’s not reading (by their definition) than outside it (ssd’s).

anuramat · 2026-04-25T23:51:34 1777161094

makes sense in a philosophical debate or when you're talking to your confused grandparents, but does anyone on hn not know how LLMs work, at least on the level of "tokens, matrices, data, sgd"?

otherwise, that reminder must imply that people do know how it works, and yet they still ascribe to these models some property like qualia, i.e. something other than "being able to turn english into code and compute into shareholder value";

but then if you disagree, why even mention it in the first place? do atheists randomly proclaim "btw god isn't real!" in unrelated conversations with strangers of unknown religious beliefs?

anuramat · 2026-04-21T08:44:55 1776761095

they (are supposed to) produce average on average, and the output distribution is (supposed to be) conditioned on the context

disgruntledphd2 · 2026-04-21T10:57:00 1776769020

Yeah but ultimately it's all just function approximation, which produces some kind of conditional average. There's no getting away from that, which is why it surprises me that we expect them to be good at science.

They'll probably get really good at model approximation, as there's a clear reward signal, but in places where that feedback loop is not possible/very difficult then we shouldn't expect them to do well.

anuramat · 2026-04-25T16:24:43 1777134283

true, but it's the same with humans, we suck at problems with sparse/delayed feedback, which includes science (math would be the exception I guess)

sure, humans are obviously better at dealing with it, but the one thing nobody is claiming is "scientists replaced by 202X"

anuramat · 2026-04-21T08:40:14 1776760814

weird, for me it was too un-human at first, taking everything literally even if it doesn't make sense; I started being more precise with prompting, to the point where it felt like "metaprogramming in english"

claude on the other hand was exactly as described in the article

anuramat · 2026-04-21T08:33:44 1776760424

wdym by "prompt and vector is small"? small as in "less tokens"? that should be a positive thing for any kind of estimation

in any case, how is this specific to transformers?

anuramat · 2026-04-21T08:29:20 1776760160

whats your setup?

anuramat · 2026-04-08T03:27:47 1775618867

it's not gonna get much more autonomous without self play and major change in architecture

anuramat · 2026-04-08T03:22:08 1775618528

as much as I hate cc, 95% of the issues there are either AI psychosis or user error

iLoveOncall · 2026-04-08T08:26:33 1775636793

So it should be insanely easy for this world altering model to comb through them and close irrelevant ones.

anuramat · 2026-04-08T08:57:07 1775638627

torturing a model with human stupidity probably doesn't align with their position on model welfare; wondering if they tried bullying it into hacking its way out of the slop gulag

HarHarVeryFunny · 2026-04-08T12:08:34 1775650114

Yes, perhaps it finds it stressful operating on itself.

Maybe that's why they haven't released it - to give it a vacation?

menno-sh · 2026-04-08T09:56:16 1775642176

@anthropic, send me an email if you need access to a jupyter notebook that'd motivate haiku to hack itself into and then back out of the pentagon

HarHarVeryFunny · 2026-04-08T12:07:16 1775650036

So "only" 250 real bugs?

anuramat · 2026-04-08T03:14:48 1775618088

imho it was more reasonable back then to claim "agi soon" -- back when nobody really knew how it scales

phire · 2026-04-08T03:19:17 1775618357

They weren't claiming it was dangerous because "AGI soon", that didn't come until later.

OpenAI were claiming GPT-2 was too dangerous because it could be used to flood the internet with fake content (mostly SEO spam).

And they were somewhat right. GPT-2 was very hard to prompt, but with a bit of effort it could spit out endless pages that were good enough to fool a search engine, and even a human at a first glance (you were often several paragraphs in before you realised it was complete nonsense.

make3 · 2026-04-13T01:52:07 1776045127

we essentially have AGI right now brother

bdangubic · 2026-04-13T01:56:26 1776045386

we got the A and G parts, just missing the I part but it’s coming :)