I still use 4.5. I occasionally try 4.6 but always switch back. The “bias towards action” is what I hate. 4.5 would make sure it understands what I want. 4.6 will just make shit up. Maybe the Anthropic people always write crystal clear instructions so it works for them. For me, I just can’t get 4.6 to do what I want.
I was surprised to see a post by Petzold on this subject. I know who he is. But I don’t think you owe an apology here. I think you made a thoughtful comment. A post like his should be critiqued for what it says, not for the author’s previous work. And, fortunately, other people could give context on the significant work he has done.
I am trying a similar spec driven development idea in a project I am working on. One big difference is that my specifications are not formalized that much. Tney are in plain language and are read directly by the LLM to convert to code. That seems like the kind of thing the LLM is good at. One other feature of this is that it allows me to nudge the implmentation a little with text in the spec outside of the formal requirements. I view it two ways, as spec-to-code but also as a saved prompt. I haven't spent enough time with it to say how successfuly it is, yet.
Do you save these "prompts" so you can improve, and in turn improve the code. to me Spec Driven Development is more than a spec to generate code, structured or not.
The spec contains formal, numbered items which are requirements and also serve to make tests (these are spec tests, additional implementation tests are also allowed by the implementer). When I said "they are not formalized as much", I mean I am not as strict on the spec format as CodeSpeak is, where their spec can be parsed with a tool. For me it is up to the LLM to use the spec itself. I have additional text beyond the requirement items which also influences how the LLM implements the code. I did this because it is too tough, for me at least, to prompt the LLM just based on strict requirements. This is perhaps cheating according to what you might call SDD. I'm just trying to be practical. The idea in the end is that this spec implies the code and maintaining the spec is the same as maintaining the code. Strictly speaking this won't be true, but I am hoping it still works anyway.
Is wealth the right term here? I thought it was supposed to measure production, with the actual measurement usually spending (with qualifiers). And, when comparing countries, you have to account for the different currencies. Currencies are typically trade balanced, which gives a rough equivelence for buying power, but that is not true with the dollar because, as the effective reserve currency, it has international demand outside of trade.
I suspect that the US having better investment opportunities than other countries (tech companies for example) might be more important than reserve currency status.
People tend to pay more attention to trade than investment, but investment flows are just as important. A trade deficit often means that foreign investors are buying and a trade surplus goes along with people investing in foreign countries.
I’m working on a solo project, a location-based game platform that includes games like Pac-Man you play by walking paths in a park. If I cut my coding time to zero, that might make me go two or three times faster. There is a lot of stuff that is not coding. Designing, experimenting, testing, redesigning, completely changing how I do something, etc. There is a lot more to doing a project than just coding. I am seeing a big speed up, but that doesn’t mean I can complete the project in a week. (These projects are never really a completed anyway, until you give up on it).
I thought Opus 4.5 was an incredible quantum leap forward. I have used Opus 4.6 for a few hours and I hate it. Opus 4.5 would work interactively with me and ask questions. I loved that it would not do things you didn't ask it to do. If it found a bug, it would tell me and ask me if I wanted to fix it. One time there was an obvious one and I didn't want it to fix it. It left the bug. A lot of modesl could not have done that. The problem here is that sometimes when model think is a bug, they are breaking the code buyu fixing it. In my limited usage of Opus 4.6, it is not asking me clarifying questions and anything it comes across that it doesn't like, it changes. It is not working with me. The magic is gone. It feels just like those other models I had used.
My go-to models have been Claude and Gemini for a long time. I have been using Gemini for discussions and Claude for coding and now as an agent. Claude has been the best at doing what I want to do and not doing what I don’t want to do. And then my confidence in it took a quantum leap with Opus 4.5. Gemini seems like it has gotten even worse at doing what I want with new releases.
This does make the food much _cheaper_. You can buy food with high quality standards in the US but it is much more expensive. Most people in the US choose the cheaper option.
I do exactly what you are describing and it seems to work for me, from a vitamin D perspective. I started this because I read a paper stating the same health benefits were not seen from supplements as with people who got the vitamin D from sunlight. I believe that is true, but of course can not be certain.
reply