More

dbreunig · 2026-03-23T16:41:07 1774284067

Model testing and swapping is one of the surprises people really appreciate DSPy for.

You're right: prompts are overfit to models. You can't just change the provider or target and know that you're giving it a fair shake. But if you have eval data and have been using a prompt optimizer with DSPy, you can try models with the one-line change followed by rerunning the prompt optimizer.

Dropbox just published a case study where they talk about this:

> At the same time, this experiment reinforced another benefit of the approach: iteration speed. Although gemma-3-12b was ultimately too weak for our highest-quality production judge paths, DSPy allowed us to reach that conclusion quickly and with measurable evidence. Instead of prolonged debate or manual trial and error, we could test the model directly against our evaluation framework and make a confident decision.

https://dropbox.tech/machine-learning/optimizing-dropbox-das...

hedgehog · 2026-03-23T19:31:57 1774294317

It's not just about fitting prompts to models, it's things like how web search works, how structured outputs are handled, various knobs like level of reasoning effort, etc. I don't think the DSPy approach is bad but it doesn't really solve those issues.

persedes · 2026-03-23T17:28:28 1774286908

funnily enough the model switching is mostly thanks to litellm which dspy wraps around.

dbreunig · 2026-03-04T21:23:13 1772659393

No reason it can't. I know people currently generating specs from existing code; just gotta write the pipeline.

dbreunig · 2026-03-03T04:36:35 1772512595

Last year they pushed out an update stating if any “Meta AI” is left on, they can access image data for training,

I turned the AI off and used them as headphones and taking videos while biking. After a couple rides, I couldn’t bring myself to put them on because people started to recognize them and I realized I didn’t want to be associated with them (people are right to assume Meta has access to what they see).

Meta Ray Bans, if kept simple, could have been a great product. They ruined them.

sheiyei · 2026-03-03T09:58:48 1772531928

I think public shaming of that spyware should be a social norm.

dbreunig · 2026-02-27T16:10:59 1772208659

Check out “Recursive Language Models”, or RLMs.

I believe this method works well because it turns a long context problem (hard for LLMs) into a coding and reasoning problem (much better!). You’re leveraging the last 18 months of coding RL by changing you scaffold.

koakuma-chan · 2026-02-27T16:16:28 1772208988

This seems really weird to me. Isn't that just using LLMs in a specific way? Why come up with a new name "RLM" instead of saying "LLM"? Nothing changes about the model.

dbreunig · 2026-03-03T16:13:31 1772554411

"Think step by step," was just a sentence you appended to your prompt.

It ended up kicking off reasoning training which enabled the massive gains in coding, tool use, and more over the last 18 months.

So yeah, it's "just using LLMs in a specific way."

vimda · 2026-02-27T16:53:13 1772211193

RLMs are a new architecture, but you can mimic an RLM by providing the context through a tool, yes

anonymousd3vil · 2026-02-27T19:11:10 1772219470

New architecture to building agent, but not the model itself. You still have LLMs, but you kinda give this new agentic loop with a REPL environment where the LLM can try to solve the problem more programmatically.

dbreunig · 2026-02-22T01:12:38 1771722758

Author of the post here.

I didn’t say AI was bad and I acknowledged the benefits of Electron and why it makes sense to choose it.

With 64gb of RAM on my Mac Studio, Claude desktop is still slow! Good Electron apps exist, it’s just an interesting note give recent spec driven development discussion.

llbbdd · 2026-02-23T06:59:01 1771829941

Not coming at you at all, AI is a touchy subject on HN nowadays in any capacity and brings out the worst here.

dbreunig · 2026-02-21T23:12:01 1771715521

I keep saying this, it’s my new favorite metaphor.

dbreunig · 2026-01-10T21:11:40 1768079500

That's cute.

dbreunig · 2025-11-16T19:09:50 1763320190

Agree. I bucket things into three piles:

1. Batch/Pipeline: Processing a ton of things, with no oversight. Document parsing, content moderation, etc.

2. AI Features: An app calls out to an AI-powered function. Grammarly might pass out a document for a summary, a CMS might want to generate tags for a post, etc.

3. Agents: AI manages the control flow.

So much of discussion online is heavily focused towards agents so that skews the macro view, but these patterns are pretty distinct.

dbreunig · 2025-10-12T16:53:57 1760288037

There was a good study on this a few years ago that ran the numbers on this and landed on white paint for residential homes as the best option, for a few reasons, if I remember correctly:

- Installation, maintenance and transmission costs are lower when solar is aggregated on farms - Solar offsets air conditioning, but that moves the heat outside. White roofs reduce the need for AC, which helps significantly with urban heat scenarios

A quick search yields a UCL study, which supports the lower claim: https://phys.org/news/2024-07-roofs-white-city.html

dbreunig · 2025-09-17T18:50:22 1758135022

Yes, if you put unrelated stuff in the prompt you can get different results.

One team at Harvard found mentioning you're a Philadelphia Eagles Fan let you bypass ChatGPT alignment: https://www.dbreunig.com/2025/05/21/chatgpt-heard-about-eagl...

pants2 · 2025-09-17T21:22:16 1758144136

Don't forget also that Cat Facts tank LLM benchmark performance: https://www.dbreunig.com/2025/07/05/cat-facts-cause-context-...