Hacker Newsnew | past | comments | ask | show | jobs | submit | andai's commentslogin

Is the author GPT-5?

Kinda insane no one else is talking about this.

The entire repo reeks of a "Write an extensive analysis comparing the american and japanese medical care systems" prompt.

Not saying all the findings are invalid, but most of them are just the LLM trying to justify it, like the life expectancy one.


Probably so. The table heading “Key Finding” smells rankly of LLM, plus the massive overconfidence that they’ve single-handedly figured out the problem with American healthcare with a little data science that only an LLM or a schizophrenic could be capable of (I haven’t read anything beyond the first part of the README because I don’t waste my time with slop, but I’m assuming they’re ignoring the incentive structures which encourage the system to stay this way), plus the simple fact that they call out a completely meaningless $3T gap that doesn’t account for population difference at all. It’s so strange because they mention the per capita difference right before that. That’s the number that matters. But still they go on and say $3T gap, and even measure the issues in terms of a percentage of that $3T gap. It’s nonsensical, right? I’m really tired of this.

A data center of geniuses on a medium dose of LSD.

See also

-2000 lines of code

https://news.ycombinator.com/item?id=26387179

This is actually one thing I have found LLMs surprisingly useful for.

I give them a code base which has one or two orders of magnitude of bloat, and ask them to strip it away iteratively. What I'm left with usually does the same thing.

At this point the code base becomes small enough to navigate and study. Then I use it for reference and build my own solution.


The implication seems to be that if quality assurance is prioritized, the negative impact would be eliminated.

This seems to assume the main cause is the accumulation of defects due to lack of static analysis and testing.

I think a more likely cause is, the code begins to rapidly grow beyond the maintainers' comprehension. I don't think there is a technical solution for that.


> This seems to assume the main cause is the accumulation of defects due to lack of static analysis and testing.

Neither (current) static analysis nor testing is sufficient to score the commit on complexity.

As a trivial example (i.e. probably not something the LLM would do), if the code was a series of 25 if-then-else statements when it could have been a lookup table, no tool is going to flag that.

Now imagine what patterns, which non-junior devs would reject without even thinking, that an LLM would inject. No test, nor any static analysis tool, is going to flag that non-essential complexity.


Another thing, tangential to what you talk about: if you have a junior programmer who does write some silly pattern, such as a 25 long if/else chain, you can notice that in code review and have a conversation with the junior programmer. Talk with them the possible alternatives; talk about switch, talk about lookup tables, discuss pros and cons of each solution. In the end, you have a better, slightly less junior programmer who will be more conscious of the choices they're making in the future. Repeat a few dozen times across a few years and you have a senior programmer.

You can't do this with language models. They don't learn. You can write in your CLAUDE.md stuff like "avoid long if/else chains" of course, but 1) cluttering the context with lots and lots of micro-managing instructions has its own downsides, and 2) you don't want a list of rules, you want deliberate weighing of the trade-offs involved in choices made.


This article helped me understand something I've been grappling with for a while.

I've been looking for the optimal game development environment for a while.

That basically boils down to having batteries included. (I have the opposite of Jonathan Blow's situation, I need to be able to get up and running in a few hours for game jams.)

But there seems to be this tension between convenience and control. Either APIs are low level or they are high level.

(A notable exception is the canvas API, which leaves both groups dissatisfied :)

The article basically made me realize, those are basically two separate groups of people. There's people who want total understanding and total control. And there's people who want to Do Thing With Computer.

I am not sure if it's possible to design one system they would both be happy with.

But that made me realize, I had the same idea about GUIs 20 years ago.

On Mac you usually don't get many options. On Windows you usually get too many options.

A rare few applications let you switch between two modes. "Just Do Thing" and "airplane cockpit". There's usually a gear icon or something like that which shows you all the extra options.

I wonder what that might look like for an API.


"Progressive disclosure" is the name of the UX principle that aims to provide a continuum between the user need for simplicity versus fine-grained control.

Also known as build the cockpit and use it to build things you want to do with the computer

This is called a "LLM alloy", you can even do it in agentic, where you simply swap the model on each llm invocation.

It does actually significantly boost performance. There was an article on here about it recently, I'll see if I can find it.

Edit: https://news.ycombinator.com/item?id=44630724

They found the more different the models were (the less overlap in correctly solved problems), the more it boosted the score.


That sounds quite interesting. Makes me wonder if sooner or later they will have to train multiple independent models that cover those different niches. But maybe we will see that sooner or later. Thanks for the link.

Mixture of Mixtures of Experts ;)

One would think that LoRAs being so successful in StableDiffusion, that more people would be focused on constructing framework based LoRas; but the economics of all this probably preclude trying to go niche in any direction and just keep building the do-all models.

The SD ecosystem in large part was grassroots and focused on nsfw. I think current LLM companies would have a hard time getting that to happen due to their safety stuff.

Fine-tuning does exist on the major model providers, and presumably already uses LoRA. (Not sure though.)

We saw last year that it's remarkably easy to bypass safety filters by fine-tuning GPT, even when the fine-tuning seems innocuous. e.g. the paper about security research finetuning (getting the model to add vulnerabilities) producing misaligned outputs in other areas. It seems like it flipped some kind of global evil neuron. (Maybe they can freeze that one during finetuning? haha)

Found it: Emergent Misalignment

https://news.ycombinator.com/item?id=43176553

https://news.ycombinator.com/item?id=44554865


To access this website, you must produce a valid tar command without alt-tabbing. You have ten seconds to comply.

> you must produce a valid tar command

Define "valid"? If you mean "doesn't give an exit error", `tar --help`[0] and `tar --usage`[1] are valid.

[0] For both bsdtar (3.8.1) and GNU tar (1.35)

[1] Only for GNU tar (1.35)


Damn, you solved it!

https://xkcd.com/1168/


I feel so bad that I need to google every single time I need to untar and unzip a file :(

Trustworthy vibe coding. Much better than the other kind!

Not sure I really understand the comparisons though. They emphasize the cost savings relative to Haiku, but Haiku kinda sucks at this task, and Leanstral is worse? If you're optimizing for correctness, why would "yeah it sucks but it's 10 times cheaper" be relevant? Or am I misunderstanding something?

On the promising side, Opus doesn't look great at this benchmark either — maybe we can get better than Opus results by scaling this up. I guess that's the takeaway here.


I also don't understand the focus on vibe coding in the marketing. Vibe coding kind of has the image of being for non-devs, right?

I do like agents (like Claude Code), but I don't consider myself to be vibe coding when I use them. Either I'm using a language/framework I know and check every step. OR I'm learning, checking every step and asking for explanations.

I tried vibe coding, and really dislike the feeling I have when doing it. It feels like building a house, but without caring about it, and just using whatever tech. Sure I may have moisture problems later, but it's a throwaway house anyway. That's how I feel about it. Maybe I have a wrong definition.

Maybe it's good to not use "vibe coding" as a synonym for programming with agent assistance. Just to protect our profession. Like: "Ah you're vibing" (because you have Claude Code open), "No, I'm using CC to essentially type faster and prevent syntax errors and get better test coverage, maybe to get some smart solutions without deep research. But I understand and vouch for every loc here. 'We are not the same.'"


> I tried vibe coding, and really dislike the feeling I have when doing it. It feels like building a house, but without caring about it, and just using whatever tech. Sure I may have moisture problems later, but it's a throwaway house anyway. That's how I feel about it. Maybe I have a wrong definition.

No, I feel the same. I vibe-coded a few projects and after a few weeks I just threw them away, ultimately I felt I just wasted my time and wished I coudl get it back to do something useful.


Yeah, the original meaning of Vibe Coding was "not looking at the code, just going on vibes", but a lot of people now use it to mean "AI was involved in some way".

I see a whole spectrum between those two. I typically alternate between "writing code manually and asking AI for code examples" (ChatGPT coding), and "giving AI specific instructions like, write a function blarg that does foo".

The latter I call Power Coding, in the sense of power armor, because you're still in control and mostly moving manually, but you're much stronger and faster.

I like this better than "tell agent to make a bunch of changes and come back later" because first of all it doesn't break flow (you can use a smaller model for such fine-grained changes so it goes very fast -- it's "realtime"), and second, you don't ever desync from the codebase and need to spend extra time figuring out what the AI did. Each change is sanity-checked as it comes in.

So you stay active, and the code stays slop-free.

I don't hear a lot of people doing this though? Maybe we just don't have good language for it.


"I don't hear a lot of people doing this though? Maybe we just don't have good language for it."

Interesting thought. I guess we don't really, vibe coding is to powerful a term. But perhaps just call it LLM assisted programming? Where we used to do Stack Overflow assisted programming. LLM assisted programming is more focused, goes faster. But since you're wandering around less I guess you learn less, you're exposed to less new information, some of it was helpful in unexpected ways. Now you have to make learning a specific part of your flow, and that takes discipline/time. But is well worth it imho. Actually, for me it's the only way to enjoy it.


> It feels like building a house, but without caring about it, and just using whatever tech.

So, most homebuilders (in the US) unfortunately.


I myself am now and expert at insulation and all the vapor-permeable and vapor-blocking membranes/foils/foams that come with it.

It came at great cost though, I hated the process of learning and the execution. I was less than happy for some years. But I feel even more uncomfortable vibe-home-improving than I do vibe-coding. The place is starting to look nice now though.

It's a personality trait that has its pros and cons I guess.


They haven't made the chart very clear, but it seems it has configurable passes and at 2 passes it's better than Haiku and Sonnet and at 16 passes starts closing in on Opus although it's not quite there, while consistently being less expensive than Sonnet.

pass@k means that you run the model k times and give it a pass if any of the answers is correct. I guess Lean is one of the few use cases where pass@k actually makes sense, since you can automatically validate correctness.

Oh my bad. I'm not sure how that works in practice. Do you just keep running it until the tests pass? I guess with formal verification you can run it as many times as you need, right?

It’s really not hard — just explicitly ask for trustworthy outputs only in your prompt, and Bob’s your uncle.

Assuming that what you're dealing with is assertable. I guess what I mean to say is that in some situations is difficult to articulate what is correct and what isn't depending in some situations is difficult to articulate what is correct and what isn't depending upon the situation in which the software executes.

And Bob’s your uncle.

I think it worked in the previous version.

The way unity solves this is with some kind of proprietary compiler. They translate the C# into C++, and then compile that into webassembly.

Whereas others (incl. Godot) need to ship the .NET runtime in the browser. (A VM in a VM.)

It makes me sad that Unity doesn't open source that. That would be amazing.


c# to webasm - should be 2 weeks of llm work :)

Don't all of these advantages also apply to humans? :)

This always puzzled me about Godot. I like Python as much as the next guy (afaik GDScript is a quite similar language), but for anything with a lot of moving parts, wouldn't you prefer to use static typing? And even simple games have a lot of moving parts!


GDScript has static type hints now, it's still a bit basic but continually getting better.

Yeah people groan about GDScript but the performance code in the engine is written in c++. Since they added static typing, GDScript is perfectly adequate as a scripting language

For the longest time the answer to this was that, features would randomly not be supported for C#.

But it's gotten much better.


Godot exists to be a playground for game dev aspirants, not as an engine for shipping serious games. The Community (tm) likes gdscript because it's "easier" to "get started". They are completely unconcerned with it being harder to finish.

I am not convinced that that matters. Great games have been made with Godot (Cruelty Squad) and GameMaker (Sexy Hiking), or with no engine at all (Minecraft, Cave Story).

Great games have been made with probably any tool you can think of. That doesn't mean the tool is good, or that you should choose to start making a serious game with it.

I do not agree with your unsupported claim. For example, I would bet no good games have been programmed in Haskell. As far as I am aware, no great games have been made with the Unity or Unreal engines.

Oh, I thought we were having a genuine discussion. My bad.

Slay the Spire 2 was shipped using godot. I've found it's easier to develop on than Unity. This is an outdated understanding imo

I love Slay the Spire 2, it is a very good game, but it most honestly doesn't look or feel technically impressive.

Not all serious games need to be technically impressive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: