Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The term is too overloaded.

I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?

Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.

Then, there's projects like BabyLM that can be useful at 10M:

https://babylm.github.io/



But you only have to train the foundational model once - so with open weights it's not really a problem.

Maybe resources needed for fine-tuning would be nice to see.


Most have been trained on illegally-distributed, copyrighted works. They might output them, too. People might want untainted models. Additionally, some have weaknesses due to tokenizers, pre-training data, or moral alignment (political bias).

For those reasons, users might want to train a new model from scratch.

Researchers of training methods have a different problem. They need to see whether a new technique, like an optimization algorithm, gets better results. They try them more quickly with less money if they have small, training runs representative of what larger models do. If BabyLM-10M was representative, they could test each technique at the FLOPS/$ of a 10M model instead of a 1B model.

So, both researchers and users might want new models trained from scratch. The cheaper to train, the better.


> Another person posted here a 100M-200M model for extracting data from web pages

Could you post a link to this comment or thread. I can't seem to find this model by searching but world love to try it out.


I think I found it. I could be getting the numbers mixed up with another SLM. That example's smaller model was 500M:

https://news.ycombinator.com/item?id=41515730




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: