The idea of "saving prompts for reproducibility" is dead on arrival. LLMs are no...

jwrallie · 2026-03-02T12:32:26 1772454746

This is something that should be possible in principle, since the machines underneath are deterministic, it’s just a limitation of the implementation.

veunes · 2026-03-09T11:06:30 1773054390

"In principle" - sure, but in practice, even if you pin the seed, your float32 calculations are going to drift due to non-deterministic CUDA kernels during parallel execution. You'll never get bit-for-bit identical tensors across different GPUs or even different driver versions, it's a fundamental property of parallel computing