Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For me vibecoding has a similar feeling to a big bag of Doritos. It's really fun at first to slap down 10k lines of code in an afternoon knowing this is just an indulgence. I think AI is actually really useful for getting a quick view of some library or feature. Also, you can learn a lot if you approach it the right way. However, every time I do any amount of vibecoding eventually it just transitions into pure lethargy mode; (apparently lethargia is not a word, by the way). Once you eat half a bag of Doritos, are you really not going to eat the second half... do you really want to eat the second half? I don't feel like I'm benefitting as a human just being a QA tester for the AI, constantly shouting that X thing didn't work and Y thing needs to be changed slightly. I think pure vibecode AI use has a difficult to understand efficiency curve, where it's obviously very efficient in the beginning, but over time hard things start to compound such that if you didn't actually form a good understanding of the project, you won't be able to make progress after a while. At that point you ate the whole bag of Doritos, you feel like shit, and you can't get off the couch.


This. First I try it just a little to do a boring part. It feels great. The boring part that was holding me is gone and all it took was a little instruction. The dopamine hit is real. So of course I will try it again. But not so fast. It needs to be corrected to make everything aligned with the architecture. And as my requests get bigger, it needs more and more corrections. Eventually correcting everything becomes too tedious, and accepting is just too easy, and so I lower my standards, and soon enough lose track of all the decisions. The branch is now useless as I don't want to debug or own this code I no longer understand hence I start over. I want work to felt like a training session where you get fairly rewarded for your efforts with better understanding, not like a slot machine where you passively hope it gets it right next time.


"My requests get bigger" is the issue here. You're not talking to a real human with common sense or near-infinite working memory.

It's a language mode with finite context and the ability to use tools. Whatever it can fit into its context, it can usually do pretty well. But it does require guidance and documentation.

Just like working with actual humans that aren't you and don't share your brain:

1) spec your big feature, maybe use an LLM in "plan" mode. Write the plan into a markdown file.

2) split the plan into smaller independent parts, in github issues or beads or whetever

3) have the LLM implement each part in isolation, add automatic tests, commit, reset context

Repeat step 3 until feature is done.

If you just use one long-ass chat and argue with the LLM about architecture decisions in between code changes, it WILL get confused and produce the worst crap you've ever seen.


Great analogy. Instead of eating the whole bag of doritos in one sitting, do it in phases. So instead of being just a QA tester, you get to pause, reflect and try to make sure you and the AI are on the same page.


> try to make sure you and the AI are on the same page.

What good is AI as a tool if it can get not on the same page as you

Imagine negotiating with a hammer to get it to drive nails properly

These things suck as tools


This is exactly what happened with my experience of vibe coding, you don't understand the code after a while and pushing the project from the 80 percent mark to the 100 percent mark is exponentially more difficult, and that's where the AI fails and you have to take over. Only, you don't know anything about the code and you give up.

I had to rewrite several vibe coded projects from scratch due to this effect. It's useful as a prototyping tool but not a complete productionizing tool.


Heh, that reminds me of the BASIC games I used to write as a kid. I didn't know that variables could be more than one letter, so S would be score, but then if I needed speed and S was taken, I'd just use T. After a while I'd hit a bug that was difficult to reason about, or leave the codebase for a while and not have any clue what the variables meant. So I'd abandon it and start with the fresh, new idea.


I have had similar experiences, and wonder how the subjective experience is impacting my estimations of progress and productivity.

Specifically: what if I just started downloading repo’s and aggressively copying and pasting to my needs… I’d get a whole bunch of code kinda quick, it’d mostly work.

It feels less interactive, but shares a high level similarity in output and understanding.


I've had it stuck in my head for months now that "LLMs are Legacy Code as a Service". A lot of what they ~~plagiarize~~ produce is based on other people's legacy code. A lot of vibe coding is producing "Day 0 Legacy Code" that is hard to debug/maintain in a lot of the exact same ways Legacy Code always is. (It was written by a developer who is not currently around. It's probably poorly commented/documented in the hows/whys rather than the whats. If it was fast tracked into production somewhere it is probably already in a "not broke, don't fix it" state where the bugs are as much a part of the expected behavior as the features.)

As a developer that has spent far too much of my career maintaining or upgrading companies' legacy code, my biggest fear with the LLM mania is not that my skills go away, but become in so much higher demand in an uncomfortable way because the turn around time between launch and legacy code becomes much shorter and the management that understands why it is "legacy code"/"tech debt" shrinks because it is neither old or in obviously dead technologies. "Can you fix this legacy application? It was launched two days ago and nobody knows what it does. Management says they need it fixed yesterday, but there's no budget for this. Good luck."


Lines of code used to be a moat for a company, it no longer is.

Being effective with the code to get the same things done is. That requires a new kind of driving for a new kind of vehicle.


I have yet to find the niche where it is "good at the beginning". So far I've mostly tried asking to build C tools that use advanced linux API.

Me: hey make this, detailed-spec.txt

AI: okidoki (barfs 9k lines in 15 minutes) all done and tested!

Me looks at the code, that has feature-sounding names, but all features are stubs, all tests are stubs, and it does not compile.

Me: it does not compile.

AI: Yes, but the code is correct. Now that the project is done, which of these features you want me to add (some crazy list)

Me: Please get it to compile.

AI: You are absolutely right! This is an excellent idea! (proceeds to stub and delete most of what it barfed). I feel really satisfied with the progress! It was a real challenge! The code you gave me was very poorly written!

... and so on.


I'm not sure what you're using. I've used Claude in agent mode to port a very complex and spaghetti coded C application to nicely structured C++. The original code was so intertwined that I didn't want to figure out so I had shelved the project until AI came along.

It wasn't super bad at converting the code but even it struggled with some of the logic. Luckily, I had it design a test suite to compare the outputs of the old application and the new one. When it couldn't figure out why it was getting different results, it would start generating hex dumps comparisons, writing small python programs, and analyzing the results to figure out where it had gone wrong. It slowly iterated on each difference until it had resolved them. Building the code, running the test suite, comparing the results, changing the code, repeat. Some of the issues are likely bugs in the original code (that it fixed) but since I was going for byte-for-byte perfection it had to re-introduce them.

The issues you describe I have seen but not with the right technology and not in a while.


At the high level, you asked LLM to translate N lines of code to maybe 2N lines of code, while GP asked LLM to translate N lines of English to possibly 10N lines of code. Very different scenarios.


The OP said the LLM didn't build anything, said it was great, and didn't even compile it. My experience has been far the opposite: not only compiling it and fixing compile time errors but also running it and fixing runtime issues as well. Even going so far as to write waveform analysis tools in Python (the output of this project was WAV files) to determine the issues.

It doesn't really matter what we told it do; a task is a task. But clearly how each LLM performed that task very different for me than the OP.


LLMs are non-deterministic for everyone. Give it time.


I'll be the first to say I've abandoned a chat and started a new one to get the result I want. I don't see that as a net negative though -- that's just how you use it.


Are you sure claude didn't do exactly the same thing but the harness, claude code, just hid it from you?

I have seen AI agents fall into the exact loop that GP discussed and needed manual intervention to fall out of.

Also blindly having the AI migrate code from "spaghetti C" to "structured C++" sounds more like a recipe for "spaghetti C" to "fettuccine C++".

Sometimes its hidden data structures and algorithms you want to formalize when doing a large scale refactor and I have found that AIs are definitely able to identify that but it's definitely not their default behaviour and they fall out of that behaviour pretty quickly if not constantly reminded to do so.


> Are you sure claude didn't do exactly the same thing but the harness, claude code, just hid it from you?

What do you mean? Are you under the impression I'm not even reading the code? The code is actually the most important part because I already have working software but what I want is working software that I can understand and work with better (and so far, the results have been good).


Reading the code and actually understanding the code is not that the same thing.

"This looks good", vs "Oh that is what this complex algorithm was" is a big difference.

Effectively, to review that the code is not just being rewritten into the same code but with C++ syntax and conventions means you need to understand the original C code, meaning the hard part was not the code generation (via LLM or fingers) but the understanding and I'm unsure the AI can do the high level understanding since I have never gotten it to produce said understanding without explicitly telling it.

Effectively, "x.c, y.c, z.c implements a DSL but is convoluted and not well structured, generate the same DSL in C++" works great. "Rewrite x.c, y.c, z.c into C++ buildings abstractions to make it more ergonomic" generally won't recognise the DSL and formalise it in a way that is very easy to do in C++, it will just make it "C++" but the same convoluted structure exists.


> Reading the code and actually understanding the code is not that the same thing.

Ok. Let me be more specific then. I'm "understanding" the code since that's the point.

> I'm unsure the AI can do the high level understanding since I have never gotten it to produce said understanding without explicitly telling it.

My experience has been the opposite: it often starts by producing a usable high-level description of what the code is doing (sometimes imperfectly) and then proposes refactors that match common patterns -- especially if you give it enough context and let it iterate.

> "Rewrite x.c, y.c, z.c into C++ buildings abstractions to make it more ergonomic" generally won't recognise the DSL and formalise it in a way that is very easy to do in C++, it will just make it "C++" but the same convoluted structure exists.

That can happen if you ask for a mechanical translation or if the prompt doesn't encourage redesign. My point was literally make it well-designed idiomatic C++ and it did that. Inside of the LLM training is a whole bunch of C++ code and it seems to be leaning on that.

I did direct some goals (e.g., separating device-specific code and configuration into separate classes so adding a device means adding a class instead of sprinkling if statements everywhere). But it also made independent structural improvements: it split out data generation vs file generation into pipeline/stream-like components and did strict separation of dependencies. It's actually well designed for unit testing and mocking even though I didn't tell it I wanted that.

I'm not claiming it has human-level understanding or that it never makes mistakes -- but "it can't do high-level understanding" doesn't match what I'm seeing in practice. At minimum, it can infer the shape of the application well enough to propose and implement a much more ergonomic architecture, especially with iterative guidance.

I had to have it introduce some "bugs" for byte-for-byte matching because it had generalized some of the file generation and the original C code generated slightly different file structures for different devices. There's no reason for this difference; it's just different code trying to do the same thing. I'll probably remove these differences when the whole thing is done.


That clarifies a lot.

So effectively it was at least partly guided refactoring. Not blind vibe coding.


Sounds like the debug mode that Cursor just announced.


> I've used Claude in agent mode to port a very complex and spaghetti coded C application to nicely structured C++

You migrated code from one of the simplest programming languages to unarguably the most complex programm language in existence. I feel for you; I really do.

How did you ensure that it didn't introduce any of the myriad of footguns that C++ has that aren't present in C?

I mean, we're talking about a language here that has an entire book just for variable initialisation - choose the wrong one for your use-case and you're boned! Just on variable initialisation, how do you know it used the correct form in all of the places?


I do a lot of C++ programming and that's really over selling the issues. You don't have to read an entire book of variable initialization to do it correctly. And using STL types are a lot safer than passing pointers around.

It's actually far easier to me to tell that it's not leaking memory or accessing some unallocated data in the C++ version than the C version.

A simple language just pushes complexity from the language into the code. Being able to represent things in a more high-level way is entirely the point of this exercise because the C version didn't have the tools to express it more cleanly.


In my case that was Claude Code with Opus.


I don't ever look at LLM-generated code that either doesn't compile or doesn't pass existing tests. IMHO any proper setup should involve these checks, with the LLM either fixing itself or giving up.

If you have a CLI, you can even script this yourself, if you don't trust your tool to actually try to compile and run tests on its own.

It's a bit like a PR on github from someone I do not know: I'm not going to actually look at it until it passes the CI.


> I have yet to find the niche where it is "good at the beginning".

The niche is "the same boring CRUD web app someone made in 2003 but with Tailwind CSS".


Good work, if you can get it.


Holy shit, I feel the same. I was arguing with an LLM one day about how to do Kerberos auth on incoming HTTP requests. It kept giving me bogus advice that I could disprove with a tiny snip of code. I would explain. It would react just like yours. After a few rounds, it would give the first answer again. Awful. So infuriating.

I had a similar issue with GNU plot. The LLM-suggested scripts frequently had syntax errors. I say: LLMs are awesome when they work, else they are a time suck / net negative.


Sometimes they just get into "argument simulator mode". There's a lot of training data of people online having stupid arguments.


You can write any program you want, as long as it is flappy bird in reactjs.


heh


Willing to name “an LLM”?

Was this a local model?


Good question. It was not my intent to be evasive about the LLM. I should have included it in my origial post. I tried the free versions of both OpenAI ChatGPT and Google Gemini. To be clear, when I say "free", I mean just go to the website and start chatting with the bot.


Include in the prompt a verifiable testable exit criteria (compiling) and use agentic AI like cursor or codex with this, you’d be surprised what happens :)


Is claude code with both Sonnet and Opus agentic enough? Because it is constantly finding creative ways to ignore direct, repeated instructions ("user asked X but it is hard, let's do Y instead"), implement fake tests ("feature X is complex. we need to test it completely. let's write script that will create files that feature X would have created, then test that files exist"), sabotage and delete working code ("we need to track FD of the open file (runs strace). The FD is 5 (hardcodes 5 in the code instead of implementing anything useful) tests pass now!")

I have not experienced the level of malice and sweet-talking work avoidance from anyone. It apologizes like an alcoholic, then proceeds doubling down.

Can you force it to produce actually useful code? Yes, by repeatedly yelling at it to please follow the instructions. In the process, it will break, delete, or implement hard to find bugs in rest of the codebase.

I'm really curious, if anyone actually has this thing working, or they simply haven't bothered to read the generated code


You need to use the features that Claude Code gives you in order to be successful with it. Your build and tests should be in a Stop hook that prevent Claude from stopping if the build or tests fail. Combining this with a Stop hook that bails out if the first hook failed n times already prevents infinite loops.

With anything above a toy project, you need to be really good with context window management. Usually this means using subagents and scoping prompts correctly by placing the CLAUDE.md files next to the relevant code. Your main conversation's context window usage should pretty much never be above 50%. Use the /clear command between unrelated tasks. Consider if recurring sequences of tool calls could be unified into a single skill.

Instead of sending instructions to the agent straight away, try planning with it and prompting it to ask your questions about your plan. The planning phase is a good place to give Claude more space to think with "think > think hard > ultrathink". If you are still struggling with the agent not complying, try adding emplasis with "YOU MUST" or "IMPORTANT".


As I'm getting better and better results with it, I'm having it do more and more things. I went through a complete agentic refactor of a project from Angular 17 to Angular 20 (RxJS to Signals) and I'd say it did it perfectly. A few times I'd get it summarize and start a new chat because it can start to get less effective when the history gets too long. I also had to iterate on what I wanted and do things a step a time. Although it was very clear that it also wanted to do things in pieces and test each major change before continuing on.

I think like any tool it's has it's pros and cons and the more you use it the more you figure out how to make the best use out of it and when to give up.


It's terrible at the niches I actually have expertise in, which are in mathematics. I'd guess an expert is going to find the flaws in anything it's doing in their field. That being said, if you're just trying to e.g. see what some GUI library can do then it's pretty useful to get something going. I personally would prefer not using it in anything that's not very much a throwaway test project though, but that is my luxury as a jobless bum.


But doesn't your argument actually mean it is terrible at absolutely everything in a very subtle, convincing way, so that it takes an actual expert in the field to tell that the generated text is not a profound revelation but a bag of nonsense?

Meaning, is the answer in the field I'm not an expert of good, or am I simply being fooled by emoji and nice grammar?


I don't think it's expert, I just don't think being expert is necessary to get some value out of it if you aren't an expert. The trap is letting the charade go on longer than it should though. I personally only see the main value in using it to create test projects or to get the gist of what a library can do. I do think that's pretty valuable, and I also think real expertise is more valuable.

Or you can do like some of the others suggest and eliminate pure vibecoding. Just use it as a back and forth where you understand along the way and make well-reasoned changes. That looks a lot more like real engineering, so it's not surprising the other commenters report better results.


Gell-Mann amnesia, but for LLMs.


It's an interesting concept, but inapplicable here because I don't trust the media reporting on LLMs and I personally believe expert programmers are never going to be replaced. My concept of the value of LLMs is that they are good for generating throwaway test code to assess the use of a library or to prototype a feature.


Whay do you mean? I love Doritos!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: