> you can't meaningfully modify them given there is almost no information available about the training data, how they were trained, or how the training data was processed.
I was under the impression that you could still fine-tune the models or apply your own RLHF on top of them. My understanding is that the training data would mostly be useful for training the model yourself from scratch (possibly after modifying the training data), which would be extremely expensive and out of reach for most people
From what i understand the training data and careful curation of it is the hard part. Everyone wants training data sets to train their own models instead of producing their own.
I was under the impression that you could still fine-tune the models or apply your own RLHF on top of them. My understanding is that the training data would mostly be useful for training the model yourself from scratch (possibly after modifying the training data), which would be extremely expensive and out of reach for most people