Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> People still don’t know how LLMs work and think they can be trained by interacting with them at the API level.

Unless they are logging the interactions via the API, and then training off those logs. They might assume doing so is relatively safe since all the users are trustworthy and unlikely to be deliberately injecting incorrect data. In which case, a leaked API key could be used to inject incorrect data into the logs, and if nobody notices that, there’s a chance that data gets sampled and used in training.



Nobody really trains directly from logs without curation and filtering.


Sure, but there is a non-zero risk that some malicious data could slip through the curation and filtering processes undetected

I agree that’s unlikely, but not astronomically unlikely


Considering the costs involved in fine-tuning, nobody does it unless they are a very rich corporation. And certainly not for public-facing models…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: