I fine-tuned an LLM to do Verification IP wiring at a LLM hardware startup. We built the dataset in house. It was quite effective actually, with enough investment in expanding the dataset this is a totally viable application.
I'm curious: did you have to tailor your dataset around instruction-following/reasoning capabilities as well? No conflict of interest myself – I'm interested in hobby programming for vintage computers – but my understanding comes from Unsloth's fine-tuning instructions. [1]
No problem - although I'm out of that particular role, it's appropriate to discuss since the company shared these details already in an openAI press release a few months back.
I fine-tuned reasoning models (o1-mini and o3-mini) which were already well into instruction-following and reasoning behavior. The dataset I prepared was taking this into account, but it was just simple prompt/response pairs. Defining the task tightly, ensuring the dataset was of high quality, picking the right hyper parameters, and preparing the proper reward function (and modeling that against the API provided) were the keys to success.