> Can anybody understand what happens and maybe explain it a little?
I spent a lot of time squashing bugs like this.
Windows has one window manager. Linux has dozens. Windows apps are written to make assumptions about how the Windows window manager works. Things like windowing event message sequences, side-effects on values returned by other APIs, the exact sequence of fullscreen status side-effects such as window size and mouse cursor capture and window chrome presence. That's valid because those always work the same way on Windows. But Linux window managers all do all of those things differently, and trying to get all dozens of window managers to behave exactly the same way as Windows's does is near impossible.
Another possibility is it's just how the game works, even on Windows. It was pretty common to get windowing bugs reported, test them on Windows, and see the exact same behavior as we had on Linux.
> I get a feeling from overall anti-AI sentiment online that a lot of people feel they're entitled to 100% of value created by anything even tangentially related to their person
Rather, I don't like that the terms I released my work under aren't being respected. I believe LLMs are derivative works of the pieces they are trained on. I spent more than ten years working on open source code, and now the models that were trained on my GPL'd code are being used to make proprietary code against the terms of the license. I find this reprehensible.
While it wasn't an explicit term of release, generally I did not expect anyone to get any kind of financial value from the blog posts I wrote. I just wrote them for fun & maybe others would find them interesting. Now, LLMs have been trained on my blog posts and are generating financial value for some of the worst human beings on the planet who are using their money to murder, demean, and maim other humans.
I now know that blog posts I wrote for fun are putting money in some sociopath's bank account, and the GPL'd code I wrote is being used to create software to exploit me & other users. If I continue to create things publicly, it will be used against me and other people, and there's nothing I can do to stop it except to stop creating things. It's all very disrespectful & demoralizing.
> I believe LLMs are derivative works of the pieces they are trained on
That's your opinion with 0 legal backing. IMO, calling them derivative is untenable logically for anyone with some understanding of LLM/transformer architecture.
You desire a sharing community, but the takers/defectors are destroying that community.
Copyleft attempts to create a pool of code that forces sharing. But it broadly fails because you simply can't force antisocial people to be good sharers (plus source code usually isn't as valuable as we hope).
With any gifting/sharing, you have to accept that some of it will be abused. It is hard to filter for only community minded people who don't greedily abuse, and ideally who give freely.
I don't believe my circle of friends are becoming more selfish. I'm unsure what I would say about the rest of the world.
I am in exactly the same boat, down to the ~10 years. Only difference is I ended up picking AGPL for my later works. Like it made a difference...
The whole situation disgusts me.
- They expect me to pay for access to my own stolen code.
- Arguing stealing should be legal because China does it and if US companies don't, they'll be left behind.
- People like the poster you're replying to who argue you're not entitled to 100% of the value you create - completely ignoring that the value will go to some-one and that some-one is already much richer than any of us and getting richer faster while providing less value, if any. Honestly, this makes me wanna track these people down just to find out if they're also in the owner class and are just secretly laughing at us while pretending "we're all equal" or if they're workers who genuinely don't understand how much they're being exploited and how much worse it's gonna get.
- People don't give a fuck. Colleagues happily using "AI" because it "saves time", not realizing if this continues, we'll all be without jobs and the only way this was possible was by stealing from each other and most of us being OK with it.
Honestly, I am hoping for a revolution. A proper one, with guns if need be, but most importantly, where people get what they deserve in full.
Last time this happened was during the second industrial revolution, so many people got fucked so hard, entire countries turned to communism. That was a bad idea but we can do better. It's not (just) about how owns the means of production but who owns the product. Even if "AI" turns into actual AI, as long as it's built on top of our work, we should own it - that means both controlling it and getting paid proportionally to our contribution.
The currently rich people can negotiate what fraction they get paid if they show us they're providing value. Of course, only after we get back what they stole and unless they end up executed. The value of a human life is apparently $7.5M so anybody who steals more than that should logically get a death sentence.
But none of this will happen, people are too stupid and will get manipulated by a charismatic liar like every single time before.
Oh man, tangent into one of my favorite library book experiences. I checked out a sci-fi book at the library. It was good I was enjoying it. Then a few chapters in, I found a previous library patron had written nit-picky notes in the margin, poking holes in the author's fictional science tech explanations. And these weren't little one-word exclamations, they were whole sentences written in perfectly legible, almost impossibly-tiny pencil handwriting. Some of them even had little drawn diagrams! It went through the whole book, every hundred pages or so some little margin notes about how such-and-such sci-fi babble didn't reflect how space-time actually works or whatever. It was a hoot, a little bonus on top of the book itself.
I had a similar experience with a second-hand copy of House of Leaves [0].
This was a special treat because the book itself already uses copious footnotes and cross-references from fictional characters to create a maze. And now a real person added to the effect by trying to make sense of it themselves.
Seems to me like coordinating with an entity outside of the spooks' control, such as the BBC, would give more opportunities for leaks. It would also reveal some information about who is controlling the signal--someone with some kind of relationship with the broadcaster.
During WWII, the BBC would daily have a section after the news dedicated to "personal messages" - which everyone knew were instructions to the resistance in France, or similar. "William waits for Mary" was one of the more famous ones related to D-Day, I think.
I do wonder how much of the apparent demand is driven by companies automatically running these things when users didn't actually ask for it. For example every web search I make now has an AI response that I scroll right past. I'm sure that counts for someone's token usage data, but I got zero value from it. This is happening in almost every software product now.
Tokens as a metric is the analogue of users as a metric.
In the end value per user is what matters in relation to being a healthy going concern and valuation in relation to Meta for example. Value per token is what should matter too - after all that’s what people are paying for.
I agree theft isn't a good analogy, but there is something similar going on. I put my words out into the world as a form of sharing. I enjoy reading things others write and share freely, so I write so others might enjoy the things I write. But now the things I write and share freely are being used to put money in the bank accounts of the worst people on the planet. They are using my work in a way I don't want it to be used. It makes me not want to share anymore.
No, what you're basically describing is "I shared something but then I didn't like how it ended up being used". If you put stuff out in public for anyone to use, then find out it's used in a way you don't like, it's your right to stop sharing, but it's not "similar" to stealing beyond "I hate stealing"
This will slightly overlap with the other replies, but to be concise:
> If you put stuff out in public for anyone to use, then find out it's used in a way you don't like, it's your right to stop sharing
Yes. The entire point of Copyright and the reason it was invented is to ensure people will keep sharing things. Because otherwise people will just stop publishing things, which is a detriment to all. (Including AI companies, who now don't get new training data)
We have collectively decided that we will give authors some power to say "I don't like how my work is being used" to ensure they don't just "stop sharing".
Fair Use is an exception to that, where the public good does outweigh an individual author's objections. But critically, not such that authors stop publishing. Hence the 4th "factor" in US copyright law (which is one of the most expansive on fair use), where the "effect of the use upon the potential market for or value of the copyrighted work" is evaluated. Fair use isn't supposed to obliterate the value of the original work, or people will stop publishing again.
This is what makes AI training's status so contentious. In terms of direct copyright it is a very weak case. It is incredibly hard to prove a direct 1:1 copy from AI training data into the model and into the output, you have to argue about the architecture of LLMs, and it's incapability of separating copyrightable expressions from uncopyrightable facts.
Yet in spirit, AI training clearly violates copyright. The explicit stated purpose is to copy the works for training data, oft without any compensation or even permission, in order to create a machine that will annihilate the market for all works used.
People already are pulling back on the amount of works they share.
> If you put stuff out in public for anyone to use, then find out it's used in a way you don't like
Nope. Copyright is a thing, licenses are a thing. Both are completely ignored by LLM companies, which was already proven in court, and for which they already had to pay billions in fines.
Just because something is publicly accessible, that does not mean everybody is entitled to abuse it for everything they see fit.
>Nope. Copyright is a thing, licenses are a thing. Both are completely ignored by LLM companies, which was already proven in court,
...the same courts that ruled that AI training is probably fair use? Fair use trumps whatever restrictions author puts on their "licenses". If you're an author and it turned out that your book was pirated by AI companies then fair enough, but "I put my words out into the world as a form of sharing" strongly implied that's not what was happening, eg. it was a blog on the open internet or something.
I never understand why anyone wants authors to not be able to enforce copyright and licensing laws for AI training. Unless you are Anthropic or OAI it seems like a wild stance to have. It’s good when people are rewarded for works that other people value. If trainers don’t value the work, they shouldn’t train on it. If they do, they should pay for it.
My own view is, I thought we were all agreed that the idea that Microsoft can restrict Wine from even using ideas from Windows, such that people who have read the leaked Windows source cannot contribute to Wine, was a horrible abuse of the legal system that we only went along with under duress? Now when it's our data being used, or more cynically when there's money to be made, suddenly everyone is a copyright maximalist.
No. Reading something, learning from it, then writing something similar, is legal; and more importantly, it is moral. There is no violation here. Copyright holders already have plenty of power; they must not be given the power to restrict the output of your brain forever more for merely having read and learnt. Reading and learning is sacred. Just as importantly, it's the entire damn basis of our profession!
If you do not want people to read and learn from your content, do not put it on the web.
If you want people to read and learn from each other, you should incentivize people to make content worth reading and learning from. Making LLM training a viable loophole for copyright law means there won’t be incentives to produce such work.
People getting better at writing is only going to increase the quality of the output.
Increasing both competition and tooling (by providing every writer with the world's greatest encylcopedia/thesaurus/line-editor/brainstormer/planner/etc) is only going to make writers better.
Will there be lots of people who misuse the system? Are there lots of people who use thesaurus words without knowing what they're talking about? Can't you tell the difference?
I see in LLMs a lowering of the ground floor making it easier for people to get in. This will increase the total availability of content.
I also see in LLMs a raising of the top bar making it harder to be the best. If more people are writing and more people are trying to be the best, the best is going to get better.
Consider chess. Have we suddenly stopped playing chess now that a phone can beat 95+% of people? No. The market is stronger than ever and still growing. The greatest player in the world use the chess algorithms to refine their play and the play keeps expanding in new and interesting ways.
In both writing and chess, yes, there is an explosion of low and middling play. But since when have we not always had people producing content and playing chess that when compared to the masters of the field is generally viewed as substandard?
But here's the kicker. Some people's favorite genre is badly editted fanfic. Some people genuinely derive actual pleasure from things that you or I might call garbage. And what's wrong with that? Who am I to say that you can't love clutzy firecop loves suburban housewife paperbacks? Or Zelda/Harry Potter crossfics or whatever.
Re-reading your comment, I think we’re both generally anti-corporate-fuckery. I view the current batch of copyright pearl clutching to be an argument about if VCs are allowed to steal books to make their chatbots worth talking to, and the Wine/MSoft debate about if it should be legal to engage in anticompetitive behavior by restrictive use of copyright. In both of these cases the root of the issue isn’t really the copyright as an abstract- it’s the bludgeoning of the person with less money by use of overwhelming legal costs to have a day in court.
I agree that's bad at any rate. However, I genuinely think that reading and learning without literal reproduction is not (should not be) a violation of copyright and does not (should not) require an additional grant for content that has been made publicly available. I think that regardless of whether a company is the subject or the actor.
Yep, and Anthropic lost that case correctly. I just don't think that "you have to buy one copy" will fix anything related to AI to the satisfaction of anybody but the law.
I wish I lived in the alternative timeline where open source folks didn't look a gift horse in the mouth and actually used these tools to copy left the shit out of software to the point where proprietary closed source software has no advantage.
But instead we've got people posting "honey pots" that an LLM will immediately detect and route around.
or the open source ecosystem will go through a renaissance as people rapidly build amazing open source software that takes weeks instead of years to develop
If you want a good analogy, try the enclosure of the commons in the British countryside. Communally managed grasslands were destroyed by noblemen with massive herds of cattle overgrazing the land, kickstarting a land grab that effectively forced people to enclose or be left behind themselves. Property is a virus that destroys all other forms of allocation.
> But now the things I write and share freely are being used to put money in the bank accounts of the worst people on the planet.
I don't think that's the case. I'm not even arguing they aren't the worst people on the planet - might as well be. But all is see them doing is burning money all over the place.
This was the title used when I came across the video. Apparently YouTube uses many different titles for A/B testing but this is the one I got. Can't edit it now, unfortunately.
EDIT: seems like dang or team took care of it, thanks!
It makes more sense when seen on YouTube where you get the thumbnail of one of M. C. Eschers famous drawings is shown.
It’s a drawing of a guy looking at a picture of a town with himself standing in the town, but it’s all twirled and twisted so it’s self repetition isn’t obvious.
I clicked on the link and the video title is "Decoding Escher's most mind-bending piece", which is a lot better. I also had no idea what "3B1B video" meant, apparently it's a channel called "3Blue1Brown".
Depends how you define excellent. If the goal is to get more views then it's not all that great, and views are kind of the point of YouTube for many, especially if they are trying to make a living from it.
Probably he didn't use these techniques explicity: the video mentions but doesn't emphasise that he probably sketched out the map by feel instead of analytically, which is probably one reason why he didn't fill in the center.
I spent a lot of time squashing bugs like this.
Windows has one window manager. Linux has dozens. Windows apps are written to make assumptions about how the Windows window manager works. Things like windowing event message sequences, side-effects on values returned by other APIs, the exact sequence of fullscreen status side-effects such as window size and mouse cursor capture and window chrome presence. That's valid because those always work the same way on Windows. But Linux window managers all do all of those things differently, and trying to get all dozens of window managers to behave exactly the same way as Windows's does is near impossible.
Another possibility is it's just how the game works, even on Windows. It was pretty common to get windowing bugs reported, test them on Windows, and see the exact same behavior as we had on Linux.
reply