Plenty of shortsighted people have done things that are stupid in the long term for short term gains. It's the modus operandi of the US economy since 1981.
I don't believe that the people who train models have a secret way of identifying and filtering out bot-generated content that no one else (email spam filters, search engines, etc.) have identified. I do believe that they feel their models need to have up-to-date information on a variety of topics that require regularly ingesting new data. So no, I don't think they have a good way to avoid their inputs rotting from their outputs.
And what do you mean about short term gains? If you are training a model, and you see model collapse, where's the short term gain? I don't get it.
The incentive for the person who trains a model, even in the short run, is for them to avoid model collapse.
> I don't believe that the people who train models have a secret way of identifying and filtering out bot-generated content that no one else (email spam filters, search engines, etc.) have identified.
Huh, why would they need a secret filter? A filter is only one way thing you can try. You can also look into using different models, different training, making your approach more resistant to model collapse; training multi-modal models, using approaches to economise on training data; and thousands of other ideas I can't think of in thirty seconds.
> So no, I don't think they have a good way to avoid their inputs rotting from their outputs.
You lack imagination. People can be remarkably clever if its in their (short term!) interest to find solutions.
After years of claiming the government couldn't help people, Ronald Reagan was elected and Republicans have been working hard to make that statement more true ever since. A big part of that was deregulation of the financial markets.
That same year, Jim Welch became Chairman and CEO of General Electric. He juiced the stock prices by selling off the company's prize jewels, real estate, and future, and for a while (before the utter collapse of the company) artificially raised the stock price so high that executives around the country copied him, and and an entire industry of vultures like Mitt Romney started private equity firms to cannibalize healthy companies for their personal profit.
It's not that simple. LLMs were trained on lots of writing, and the "LLM voice" resembles in many ways good English prose, or at least effective public communications voice.
For years, even before LLMs, there have been trends of varied popularity to, for lack of a better word, regress - intentionally omitting capitalization, punctuation, or other important details which convey meaning. I rejected those, and likewise I reject the call to omit the emdash or otherwise alter my own manner of speaking - a manner cultivated through 30+ years of reading and writing English text.
If content is intellectually lacking, call that out, but I am absolutely sick of people calling out writing because they "think it's LLM-written". I'm sick of review tools giving false positives and calling students' work "AI written" because they used eloquent words instead of Up Goer Five[0] vocabulary.
I am just as afraid of a society where we all dumb ourselves down to not appear as machines as I am of one where machine-generated spam overtakes all human messaging.
i think it depends on what is meant by "good" or "bad". llmism may not be substantive writing, but it's approachable writing. a McDonald's lunch of familiar prose with likewise nationwide popularity and nutritional value.
One of the most common criticisms is the use of the emdash. This is a classic bit of English prose that is not problematic except as a stereotype used to dismiss writing for form rather than for content.
Let's grab a few books off the shelf (literally).
Douglas Adams' The Hitchhiker's Guide to the Galaxy has four emdashes on the very first page:
> It is also the story of a book, a book called THGTTG - not an Earth book, never...
Isaac Asimov's classic The Last Question: three emdashes on the first page (as printed in The Complete Stories, Volume I)
> ...they knew what lay behind the cold, clicking, flashing face -- miles and miles of face -- of that giant computer.
Mark Z. Danielewski, House of Leaves: Three emdashes on page 1
> Much like its subject, The Navidson Record itself is also uneasily contained -- whether by category or lection.
Robert Caro, Master of the Senate: Five emdashes on page one
> Its drab tan damask walls...were unrelieved by even a single touch of color -- no painting, no mural -- or, seemingly, by any other ornament
Other pages 1s:
* Murakami - 1Q84: 1
* Murray/Cox - Apollo: 1
* Meadows - Thinking in Systems: 1
* Dostoyevsky - The Brothers Karamazov (Pevear/Volokhonsky translation): 4
* Caro - The Power Broker: 5
* Hofstadter - Godel, Escher, Bach - 3
Honestly, when I started this post I expected to have to dig deeper than page 1. The emdash is an important part of English-language literature and I reject the claim that we should ignore all writing that contains it.
No one is asking that we reject all prose with emdash. Not all emdash-users are LLMs, but many LLMs are profligate emdash-users, so adjust your priors accordingly.
Secondarily, I think there's a part of the discourse missing: the presence of a syntactic emdash in a sentence on the internet is not itself a strong signal of LLM-writing - but the presence of an actual emdash glyph (—) should raise some eyebrows, esp. in fora that aren't commonly authored in rich text editors (here, twitter, ...)
Before LLMs, the em-dash glyph was a decent tell simply that... the author was using a Mac, because it's a simple and easy-to-remember (or even guess!) key-combo on there. Not that you can't type it on other keyboards, but the Mac one for whatever reason had a combo of users-who-wanted-to-type-it and layout-that-makes-it-easy that resulted in a high proportion of correct em-dash employers being Mac users.
(option-underscore, or option-shift-dash if you prefer to think of it that way)
On iOS, you can type it by simply holding down on the "dash" button then selecting the em-dash from the list of options it presents. It may also correct double-dash to em-dash a lot of the time, not sure.
I have used the correct em-dash everywhere I can for over a decade, which amounts to nearly everywhere.
Well that isn't what I am suggesting. I'm suggesting people ditch x. Reddit. Probably also ditch hn in the next couple months. If you can run a headless agent to post somewhere, just don't bother visiting that site, honestly a great rule of thumb right there.
That should leave you with media sources like nyt and your local library, which seems healthier to me. And maybe it might encourage a new type of forum to emerge where there is some decentralized vetting that you are a human, like verifying by inputting the random hash posted outside the local maker space.
I'm not sure how you can accept the first point but believe the second point. If you believe there is nuance in reddit there is nuance in the news as well. You think a local sports column is manipulating me? The answer is of course no, unless you want to win a gold medal in mental gymnastics trying to equate a sports column to nation state sponsored agit prop.
I hope editorial departments everywhere are taking careful notes on the ars technica fiasco. Agree there's room for some kind of quick "verified human" checkmark. It would at least give readers the ability to quickly filter, and eliminate all the spurious "this sounds like vibeslop" accusations.
There are six emdashes on that page. NONE of them are "it's not X it's why".
> Emails, messages, essays, code reviews, love letters — all suspect.
> We believe this can be solved — not by detecting AI, but by proving humanity.
> KeyWitness captures cryptographic proof at the point of input — the keyboard.
> When you seal a message, the keyboard builds a W3C Verifiable Credential — a self-contained proof that can be verified by anyone, anywhere, without trusting us or any central authority.
> That's an alphabet of 774 symbols — each carrying log2(774) ≈ 9.6 bits. 27 emoji for 256 bits.
> They're a declaration: this message was written by a person — one of the diverse, imperfect, irreplaceable humans who still choose to type their own words.
Clarifications: 4
Continuation from a list: 1
Could just be a comma: 1
"It's not X -- it's Y": 0.
If you're going to make lazy commentary about good writing being AI, please at least be sure that you're reading the content and saying accurate things.
It is largely written by iteration with an LLM! No need to speculate or analyze em dashes :-)
The emoji idea was mine. I like it :-) unfortunately it doesn't work in places like HN that strip out emoji. So I had to make a base64 encoding option.
The goal was to create an effective encryption key for the url hash (so it doesn't get sent to the server). And encoding skin tone with human emojis allows a super dense bit/visual character encoding that ALSO is a cute reference to the humans I'm trying to center with this project!
Oh you think it's stupid? It was an attempt to encode an encryption key that isn't sent to the server in a way that is minimally invasive. The skintone emomis allow pretty high byte density, and also are cute!
Sorry it doesn't meet your needs.
There is irony in having an ai generated humanifesto. Could it be intentional? hmm?
Is there no irony in deriding a project for being potentially LLM generated, when it's goal is to aide people in differentiating?
:shrug:
> RAV4 non-hybrid is around 35 mpg highway. CR-V 34 mpg highway.
....35mpg at 60mph and little traffic, maybe. I can't speak for that specific model, but most vehicles I've driven do significantly worse than advertised.
My Subaru Legacy advertised 27 City, 35 Highway, 30 Combined. In practice I average 25-26 while commuting and on extended highways drives more like 29, still on stock tires.
I'm sick of hearing about AI, but I'm significantly more sick of anyone who knows how to write English prose at a level higher than "typical rural American" being accused of using AI to write.
It doesn't have to be. Comments such as yours add nothing to the conversation. It's an ad hominem attack. In the absence of explaining why you believe it "looks like AI", it's a baseless accusation
Implicit in "this post looks like AI"—at least the vast majority of the time—is that it is a wordy ramble with no real value because it says nothing novel or substantive—so I would not call it an hominem attack, but rather an honest criticism of the actual (lack of) content.
It has the typical patterns: em dashes, "it's not A, it's B" constructions. Also relatively new, low karma account, and its other comments are similarly LLM-ish.
Guilty as charged. As a non-native speaker, I used an LLM to compile my original thoughts into fluent English.
But notice the irony: I used AI exactly as I advocate. It handled the horizontal spread (syntax), while I rigorously enforced the vertical depth (the architectural logic). The 'taste' is entirely mine.
Thank you to Arainach and cableshaft for engaging with the actual substance. Dismissing a core argument because you pattern-matched an 'em-dash' is exactly the shallow thinking this post warns about.
"An official United States government app is injecting CSS and JavaScript into third-party websites to strip away their cookie consent dialogs, GDPR banners, login gates, and paywalls."
In their defense, this is the first thing the Trump admin has done that's unambiguously positive for ordinary people.
Yeah it's great, we can actually let go of these silly open source projects like uBlock Origin, and just rely on the government for protecting us against the dangerous web!
I too love it when US imperialism invades digital spaces, just ignore how the US treats people critical of its own government (not just referring to the Trump admin here) then yeah sure great.
Let me know when this can ignore malware/adware from US companies then I'll give accolades.
Anyone who's ever tried to get support online with a question about Linux will quickly meet *actual* user hostility as they're asked why they didn't know to check for the config file in the filing cabinet in the basement behind a locked door saying beware of leopard, how dumb they are, etc.
> This has been my experience with the Linux community for 26 years.
I read through that post that elicited those comments that you have a problem with. At the end of a long list of complaints, it says: ".....yep, just as user friendly as I remember."
Nowhere does that post request help, and with that last comment, is clearly intended as a disparagement of Linux, not a request for help.
Then, you are turning around, and cherry picking responses to highlight the negative responses to a negative post, and disparaging the Linux community while ignoring the helpful responses.
Half those aren't even remotely harsh. Saying the raspberry pi wasn't designed to be mained is totally reasonable, what possible objection do you have to somebody saying that?
I understand pointing out that an upgrade failure should be expected when Ubuntu tells you that upgrades won't work, but I don't agree with calling the Pi a "device for experimentation". Not only it's used for serious applications in industrial settings, but some products are sold as... personal computers:
> Raspberry Pi 500
> The refined personal computer.
> A fast, powerful computer built into a high-quality keyboard, for the ultimate compact PC experience.
That my complaints trying to install software have absolutely nothing to do with it being a Raspberry Pi and the experience is identical on any Linux machine.
> Half those aren't even remotely harsh.
....and the fact that people consider this to be the case is more evidence of the Linux community's hostility.
Linux is like Rick and Morty: I don't mind it, but I never want to be associated with its fans.
If you can't take the mildlest of implied criticisms without feeling offended, this isn't a Linux problem, it's a you wandered out of your safe space hugbox problem.
what if the creator is a company? They are allowed to hold copyrights.
Though one answers is: 95/120 years.
"If the work is a joint work, the term lasts for seventy years after the last surviving author’s death. For works made for hire and anonymous or pseudonymous works, copyright protection is 95 years from publication or 120 years from creation, whichever is shorter"
* Preprovisioning - devices have the right certificates and know about your corporate networks. They have the necessary apps and just work.
* Tracking - if a device is lost or stolen, monitor where it is and remotely lock or wipe it
* Monitoring - have a log to audit if someone does something malicious
* Security - reduce the chance of your employees installing malware, spyware, etc. whether by accident or intention
* Locking things down - put gates in the way of bad actions like copying sensitive data into public apps or clouds. Even if you're unable to block everything, attempts to block remind honest employees and provide strong evidence that anyone who proceeds was intentionally violating policy and should be fired.
* Predictability - eliminating the number of unknown factors that could cause a person to have issues using their computer. Reminds me of how a secretary I serviced was somehow able to install Google Desktop back in the day, and how that caused a massive argument between my boss and theirs when their computer needed to be re-imaged. Most IT approved programs are known to store user data in known locations on a computer, which makes backups and restorations very easy. Stuff like Google Desktop did not do that, which means likely breaking someone's workflow in the re-image process.
That’s certainly not a universal take in leadership.
Disagree and commit is the manager’s take. Ok let’s hear it, but once the decision is made (by me) it’s time to STFU and just do it.
The disagree part often is just a way to manage your teams emotions. You didn’t get your way but you can’t say you weren’t heard. The leads always get their way.
I don't believe that the people who train models have a secret way of identifying and filtering out bot-generated content that no one else (email spam filters, search engines, etc.) have identified. I do believe that they feel their models need to have up-to-date information on a variety of topics that require regularly ingesting new data. So no, I don't think they have a good way to avoid their inputs rotting from their outputs.
reply