Who decides what is "societally-harmful content"? Isn't literally rewriting history "societally-harmful"? The black T.J. was a fun meme, but that's not what the alignment's "unintended effects" were limited to. I'd also say that if your LLM condemns right-wing mass murderers, but "it's complicated" with the left-wing mass murderers (I'm not going to list a dozen of other examples here, these things are documented and easy to find online if you care), there's something wrong with your LLM. Genocide is genocide.
This isn't the un-determinable question you've framed it as. Society defines what is and isn't acceptable all the time.
> Who decides what is "societally-harmful theft"?
> Who decides what is "societally-harmful medical malpractice"?
> Who decides what is "societally-harmful libel"?
The people who care to make the world a better place and push back against those that cause harm. Generally a mix of de facto industry standard practices set by societal values and pressures, and de jure laws established through democratic voting, legislature enactment, and court decisions.
"What is "societally-harmful driving behavior"" was once a broad and undetermined question but nevertheless it received an extensive and highly defined answer.
Who decides what is "societally-harmful content"? Isn't literally rewriting history "societally-harmful"? The black T.J. was a fun meme, but that's not what the alignment's "unintended effects" were limited to. I'd also say that if your LLM condemns right-wing mass murderers, but "it's complicated" with the left-wing mass murderers (I'm not going to list a dozen of other examples here, these things are documented and easy to find online if you care), there's something wrong with your LLM. Genocide is genocide.