I fine-tuned an LLM to do Verification IP wiring at a LLM hardware startup. We b...

nxobject · 2025-08-20T21:45:58 1755726358

I'm curious: did you have to tailor your dataset around instruction-following/reasoning capabilities as well? No conflict of interest myself – I'm interested in hobby programming for vintage computers – but my understanding comes from Unsloth's fine-tuning instructions. [1]

[1] https://docs.unsloth.ai/basics/datasets-guide

rybosome · 2025-08-20T21:55:41 1755726941

No problem - although I'm out of that particular role, it's appropriate to discuss since the company shared these details already in an openAI press release a few months back.

I fine-tuned reasoning models (o1-mini and o3-mini) which were already well into instruction-following and reasoning behavior. The dataset I prepared was taking this into account, but it was just simple prompt/response pairs. Defining the task tightly, ensuring the dataset was of high quality, picking the right hyper parameters, and preparing the proper reward function (and modeling that against the API provided) were the keys to success.

rbanffy · 2025-08-21T18:00:39 1755799239

That’s really cool. I’d love to see that process from up close.