Yeah, though I do wonder for a big model like 405B if the original training reci...

Yeah, though I do wonder for a big model like 405B if the original training recipe, really matters for where models are heading, practically speaking which is smaller and more specific?

I imagine its main use would be to train other models by distilling them down with LoRA/Quantization etc(assuming we have a tokenizer). Or use them to generate training data for smaller models directly.

But, I do think there is always a way to share without disclosing too many specifics, like this[1] lecture from this year's spring course at Stanford. You can always say, for example:

- The most common technique for filtering was using voting LLMs (without disclosing said llms or quantity of data).

- We built on top of a filtering technique for removing poor code using ____ by ____ authors (without disclosing or handwaving how you exactly filtered, but saying that you had to filter).

- We mixed certain proportion of this data with that data to make it better (without saying what proportion)

[1] https://www.youtube.com/watch?v=jm2hyJLFfN8&list=PLoROMvodv4...