More

winwang · 2026-03-26T19:17:45 1774552665

How much of this is expectations setting by the heights models reach? i.e. of we could assess a consistent floor of model performance in a vacuum, would we say it's better at "AGI" than the bottom 0.1% of humans?

fc417fc802 · 2026-03-26T23:23:59 1774567439

Not sure how to answer because we were off on a tangent there about mental models.

I think AGI is two things. Intelligence at a given task, which can be scored versus humans or otherwise. And generalization which is entirely separate. We already have superhuman non-general models in a few domains.

So I don't think that "better than AGI at % of humans" is a sensible statement, at least not initially.

Right now humans generalize to all integers while AI companies keep manually adding additional integers to a finite list and bystanders make claims of generality. If you've still got a finite list you aren't general regardless of how long the list is.

If at some point a model shows up that works on all even integers but not odd ones then I guess you could reasonably claim you had AGI that was 50% of what humans achieve. If a model that generalizes to all the reals shows up then it will have exceeded human generality by an infinite degree. We'll cross those bridges when we come to them - I don't think we're there yet.

winwang · 2026-03-26T23:34:19 1774568059

Interestingly, I find that the models generalize decently well as long as the "training" (more analogous to that for humans) fits in (small enough) context. That's to say, "in-context learning" seems good enough for real use.

But of course, that's not quite "long term"

fc417fc802 · 2026-03-27T02:02:31 1774576951

Given that models don't currently learn as they go isn't that exactly what this benchmark is testing? If the model needs to either have been explicitly trained in a similar environment or else to have a human manually input a carefully crafted prompt then it isn't general. The latter case is a human tuning a powerful tool.

If it can add the necessary bits to its own prompt while working on the benchmark then it's generalizing.

winwang · 2026-03-23T17:09:11 1774285751

It would be much worse if it had said "You are absolutely wrong to be confused", haha.

winwang · 2026-03-23T12:45:57 1774269957

Hey HN, I figured to just share this for feedback despite its dry-ness and small-idea-ness.

winwang · 2026-03-23T12:33:42 1774269222

(no idea but) I feel like changing the first number has a psychological issue, but the 2nd number feels more important than just "minor" sometimes. So may as well let the schema set the mind free?

winwang · 2026-03-20T20:04:13 1774037053

...I almost thought it was a parody site!

winwang · 2026-03-16T12:43:16 1773664996

Interesting. I've felt like it's never been easier to learn things, but I suppose that's not quite the same as "acquiring new skills". I don't know if it applies, but it's always been easy to take the easy way out?

I feel like AI has made it a bit easier to do harder things too.

winwang · 2026-03-08T21:47:26 1773006446

I don't think lived experience matters too much to me. In some sense, AI has very unique "lived" experience, which is what creates the voice it uses ("doesn't have a voice" seems like an impossibility to me by definition).

I find AI very "human-esque", and its "self-reported" phenomenology is very entertaining to me, at least.

I also think AI writing might feel trashy also because most human writing is trashy.

lokar · 2026-03-08T23:20:28 1773012028

The LLM “voice” is the average of Reddit, and is therefore irredeemable.

winwang · 2026-03-08T23:40:15 1773013215

Yeah that's somewhat close to what I meant, though there's an irony here in that your comment (and this one) are pretty reddit-esque.

winwang · 2026-03-05T02:54:06 1772679246

Really interesting. I was thinking about something similar regarding the shape of code. I have no qualms recommending my agents take static analysis to the extreme, though it would cumbersome for most people.

winwang · 2026-03-04T14:13:39 1772633619

What about someone inexperienced but skeptical, using AI to learn + fix their own code before opening the PR?

HarHarVeryFunny · 2026-03-04T15:34:32 1772638472

That's an interesting question ... how should a less experienced developer use AI productively, and learn while developing? Certainly using it as a magic genie and vibe coding something you are in no position to evaluate is not the way to go, nor is that a good way for anyone to use AI if you care about the quality or specifics of the end result!

There's always going to be some overlap, wanting to use a new skill/library in a production system, but maybe in general it's best to think of learning and writing/generating production code as two separate things. AI is great for learning and exploration, but you don't want to be submitting your experiments as PRs!

A good rule of thumb might be can you explain any AI-generated design and code as well as if you had written it yourself? If you don't fully understand it, then you are not in a good position to own it and take responsibility for it (bugs, performance, edge case behavior, ease of debugging, flexibility for future enhancement, etc).

winwang · 2026-03-04T09:09:55 1772615395

Linear walkthrough: I ask my agents to give me a numbered tree. Controlling tree size specifies granularity. Numbering means it's simple to refer to points for discussion.

Other things that I feel are useful:

- Very strict typing/static analysis

- Denying tool usage with a hook telling the agent why+what they should do (instead of simple denial, or dangerously accepting everything)

- Using different models for code review