The temperature parameters largely went away when we moved towards reasoning models, which output lots of reasoning tokens before you get to the actual output tokens. I don’t know if it was found that reasoning works better with a higher temperature, or that having separate temperatures for reasoning vs. output wasn’t practical, but that’s my observation of the timing, anyway. And to the other commenter’s point, even a temperature of 0 is not deterministic if the batches are not invariant, which they’re not in production workloads.
If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
Thanks! That makes sense. I suppose this requires commit messages or PRs to indicate code was AI-generated vs. not, or to assume that commits after a certain time period were all from AI coding. It’d be an interesting analysis. Maybe there’s already a study out there.
100%. This is what I posted about on Hacker News ([1] where it got no traction) and Reddit [2] (where it led to a discussion but then got deleted by a mod).
Can you say more about the approach you take for summarization? Are the papers short enough that you just put the whole thing in the context window of the model you’re using, or do you do anything fancy? I’ve tried out various summarization approaches (hierarchical, aspect-based, incremental refinement), and am curious what you found works best for your use case.
This is something I built over the holidays to support people having a hard time with the short days and early sunsets: https://sunshineoptimist.com.
For the past several years I would look up the day lengths and sunset times for my location and identify milestones like “first 5pm sunset”, “1 hour of daylight gained since the winter solstice”, etc. But that manual process also meant I was limited to sharing updates on just my location, and my friends only benefitted when I made a post. I wanted to make a site anyone could come to at any time to get an optimistic message and a milestone to look forward to.
Some features this has:
- Calculation of several possible optimistic headlines. No LLMs used here.
- Offers comparisons to the earliest sunset of the year and shortest day
- Careful consideration of optimistic messaging at all times of year, including after the summer solstice when daylight is being lost
- Static-only site, no ads or tracking. All calculations happen in the browser.
I think the models are so big that they can’t keep many old versions around because they would take away from the available GPUs they use to serve the latest models, and thereby reduce overall throughput. So they phase out older models over time. However, the major providers usually provide a time snapshot for each model, and keep the latest 2-3 available.
This reminds me a bit of using LLM frameworks like langchain, Haystack, etc., especially if you’re only using them for the chat completions or responses APIs and not doing anything fancy.
reply