Hacker Newsnew | past | comments | ask | show | jobs | submit | mr_toad's commentslogin

It’s practically impossible to take pictures of a famous monument without having other people in the frame (usually they’re posing for photos themselves). AI can remove them, with varying degrees of success.

Ironically, it would probably be easier for the AI to generate the photo of the monument without the people. I mean, for famous monuments, whatever photo you're about to take, you could find 10 better ones already on-line, taken from the same point and perspective, and uploaded to Flickr or Instagram or wherenot.

Weren’t Samsung phones doing something like this? If you tried to take a picture with the moon in it, it would just generate an image of the moon?

> Broadly speaking, GPL is a license that has specific provisions about creating derivative software from the licensed work, and just saying "fair use" doesn't exempt you from those provisions.

Broadly speaking, yes it does. The whole point of fair use is that you don’t need a license.


Claiming LLMs are fair use is ridiculous bordering on ignorant or disingenuous.

Here’s the 4 part test from 17 U.S.C. § 107:

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

Fail. The use is to make trillions of dollars and be maximally disruptive.

2. the nature of the copyrighted work;

Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

Fail. They use all of it.

4. the effect of the use upon the potential market for or value of the copyrighted work.

Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

- - -

Hell, LLMs don’t even pass the sniff test.

The only reason this stuff is being entertained is some combination of the prisoner’s dilemma and more classic greed.


This comment highlights a basic dilemma about how and where to spend your time.

Here's a basic rule of thumb I recommend people apply when it comes to these sorts of long, contentious threads where you know that not every person showing up to the conversation is limiting themselves to commenting about things they understand and that involve some of the most tortured motivated reasoning about legal topics:

If the topic is copyright and someone who is speaking authoritatively has just used the words "copy written", then ignore them. Consider whether you need to be anywhere in the conversation at all, even as a purely passive observer. Think about all the things you can do instead of wasting your time here, where the stakes for participation are so low because nothing that is said here really matters. Go do something productive.


Yet you still wasted your own time and everyone else’s time with a reply that has even less substance.

I was making an argument based on quotes from the actual legal code and you’re saying pions who don’t use the exact correct terminology shouldn’t even consider what should or shouldn’t be legal? What a load of junk. This is a democracy. We’re supposed to be engaging with it.


You’re mixing up “using” with “copying”. You are allowed to “use” all of a book or movie or code by listening to or watching or reviewing the whole thing. Copyright protects copies. The legal claim here is than training an LLM is sufficiently transformative such that it cannot be construed as a copy.

I replied to someone saying that it’s fair use, which presupposes that it’s a derivative work.

> Fail. The use is to make trillions of dollars and be maximally disruptive.

Fair use has repeatedly been found even in cases where the copies were used for commercial purposes. See Sony v. Connectix for example, where the cloning and disassembly of the PlayStation BIOS for the purposes of making a commercially sold (at retail, in a box) emulator of a then currently sold game console was determined to be fair use.

> Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

Again, see Sony V. Connectix where the sales of PlayStation consoles support the livelihoods and skilled labor of Sony engineers.

> Fail. They use all of it.

And again, see Sony V. Connectix, where the entire BIOS was copied again and again until a clone could be written that sought to reproduce all the functionality of the real BIOS. Or see Google V. Oracle where cloning the entire Java API for a competing commercial product was also deemed fair use. Or the Google Books lawsuits, where cloning entire books for the purposes of making them searchable online was deemed fair use. Or see any of the various time/format shifting cases over the years (Cassette tapes, VCRs, DVRs, MP3 encoders, DVD ripping etc) where making whole and complete copies of works is deemed fair use.

> Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

Again, see Sony v. Connectix where the commercial product deemed to be fair use was directly competing with an actively sold video game console. Copyright protects the rights of creators to exploit their own works, it does not protect them against any and all forms of competition.

Or perhaps instead of referring you to the history of legislation around copyright in the digital age, I should instead simply point you at Judge Alsup's ruling in the Bartz case where he details exactly why the facts of the case and prior case law find that training an AI on copyrighted material is fair use [1]. Of particular interest to you might be the fact that each of the 4 factors is not a simple "pass/fail" metric, but a weighing of relative merits. For example, when examining factor 1, Judge Alsup writes:

> That the accused is a commercial entity is indicative, not dispositive. That

> the accused stands to benefit is likewise indicative. But what matters most

> is whether the format change exploits anything the Copyright Act reserves to

> the copyright owner.

[1]: https://admin.bakerlaw.com/wp-content/uploads/2025/07/ECF-23...


I appreciate the detailed reply and that there’s subtlety here.

I read the linked Bartz case. It’s disappointing that it seems limited to only the copying of books into a data set and not the result of training LLM on protected works. This is not the “use” that I was discussing and not very interesting.

The plaintiffs didn’t even challenge that the outputs of the LLMs infringe. They judge seems to agree (at least by omission) that fair use wouldn’t apply but that the outputs were transformative and in cases where they aren’t:

> [anthropic] placed additional software between the user and the underlying LLM to ensure that no infringing output ever reached the users.

So this is not true:

> he [the judge] details exactly why the facts of the case and prior case law find that training an AI on copyrighted material is fair use

The plaintiffs also make really awful arguments about “memorizing” and “learning” that falsely anthropomorphize LLMs. Which the judge shoots down.

If we’re going to give LLMs the same rights as humans, there’s unlikely to much of an argument.

I think there’s potential for an argument about how LLMs use “compressed” versions of protected works to _mechanically_ traverse language space. It would be subtle and technical so maybe not likely to work in our current context.


These are factors to be considered, not pass/fail questions.

The official name is the AWS management console. Or just the console.

The ‘dashboard’, the ‘interface’? Reminds me of coworkers who used to refer to desktop PC cases as the hard drive, or people who refer to the web as ‘Google’.


Wow this makes me nostalgic for 2000s era pointless web rage. It’s better material; keep up the good fight.

Reminds me of a story about Giotto di Bondone, an artist who when called upon to prove his talent drew, freehand, a perfect circle. Something which seems simple, but which is actually very difficult.

Maybe. There’s another meaning for sellout - an event that is all sold out.

That makes me wonder if the meaning of a sellout artist was an analogy to an event which became commercially popular, and was (literally) no longer accessible to long-term fans.


> If there was anything like a high power transistor back then he would have used that.

Mercury arc rectifiers were used long before his death.


Yes, but a rectifier only rectifies. That's not going to give you DC-DC conversion - let alone converting it to a higher voltage for long-distance transmission.

DC-DC before the transistor was difficult to do at scale. Vibrators and relays existed but were not reliable long term.

Routers have to follow the same standards as other electrical appliances.

Those standards aren’t related to the functionality or security of the router.


Still need massive amounts of compute for training. Nobody is going to be training 400B models on a phone any time soon.

Likely not.

We’re seeing a massive slowing in the value of all that additional training. Folks don’t like to talk about that, but absent a completely new break-thru the current math of LLMs has largely run its course.

We simply don’t need massive training forever and ever. We’re getting to the point that “good enough” models will solve most use cases. The demonstrated business value is also still broadly missing for AI on the level required to keep funding all this training for much longer.


I dunno, I thought that too for a while too, but there are a lot of new ideas in terms of architecture that may warrant massive training runs. Mamba and state space models are pretty interesting, but haven’t had their transformer moment yet because I haven’t really seen anyone go for broke on training it with a huge data set and model size. Even some of the more fundamental changes too like Kolmogorov–Arnold Networks or some of the ideas behind continuous back propagation haven’t really had the opportunity to be pushed to the limit. I think it’s still early days on what these models can do. And I say this as someone who bought a Mac m3 max 128gb ram, based on the hope that the on device training and inference work would eventually move locally. It’s encouraging to see the progress though and I hope it does move locally though.

> but there are a lot of new ideas in terms of architecture that may warrant massive training runs

I don't think the argument is that isn't true, it's that the gains from those massive training runs is diminishing. Eventually, it won't be worth it to do the run for each new idea, you'll have to bundle a bunch together to get any noticeable change.


Same here. Then you see SOTA in a browser from Ex0byt, online 10x training (JIT-Lora), TurboQuant (Google), etc. Just saw KV prediction mentioned in this thread, so looking into that too.

I'm adapting all of this to Rust+WGPU with compute shaders if you want to follow along.

See this repo: https://github.com/tmzt/shady-thinker

Goal is Qwen3.5 27b on a Pixel 10 Pro running GrapheneOS.


I could see apple doing just that because they can and then having this another selling point of selling their own hardware. like their software is hard customized to run on their own hardware and vice versa (at least on paper), they could totally get some LLM going that works perfectly well on their chips specifically as a good enough local model in the next years, and promote it as kind of you-don't-need-a-subscription-when-you-have-an-iphone kind of thing. given the advances in recent years in the LLM space sounds kinda realistic to arrive somewhere that locally just works mid-term.

> They're very easily produced, by two people getting it on.

Something like 360-380 thousand births every day. We’re insignificant even just compared to the delta.


I’ve used travel SIMs that only give you about 5GB. You avoid using the web at all, unless you are on WiFi. You can use maps, train and bus apps, banking apps, messaging, AirBnB etc, but not the web. If you go to some place and they want to use a QR code to buy a ticket or use a menu you may as well forget about it.

With a pay as you go google fi plan... the trick is to use firefox + uBO. If a site opens in the default Android web view, you're fucked.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: