Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> you can't meaningfully modify them given there is almost no information available about the training data, how they were trained, or how the training data was processed.

I was under the impression that you could still fine-tune the models or apply your own RLHF on top of them. My understanding is that the training data would mostly be useful for training the model yourself from scratch (possibly after modifying the training data), which would be extremely expensive and out of reach for most people



From what i understand the training data and careful curation of it is the hard part. Everyone wants training data sets to train their own models instead of producing their own.


Indeed, fine-tuning is still possible, but you can only go so far with fine-tuning before you need to completely retrain the model.

This is why Silo AI, for example, had to start from scratch to get better support for small European languages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: