There were Internet search engines before Google, but Google did it way better.
I remember when Gmail was new. It was way, way better and more amazing than Hotmail. The idea was a practically infinite searchable inbox. Nothing else was like it at the time.
I think it would be unfair to not give credit to Google for YouTube. YouTube was indeed a visionary idea with legs, but it is so much further developed now than in 2005. And a lot of it has to do with the way Google has nurtured it over the years.
You could also say there were digital music players before the iPod, Apple copied the Mac from Xerox, and there were smart phones before the iPhone.
Cars also hit bikes when cars are backing up, making turns, driving above the speed limit, driving below the speed limit, when the driver opens the door. Really, many varied circumstances that have nothing to do with making a U-turn.
Perhaps the phrase “perfectly safe” is wrong. Maybe “otherwise intrinsically safe” would be more accurate. Having said all that, I do wonder if we have U-turn accident statistics.
As merely two examples, both gRPC and Kubernetes are important to Google, and yet Google opened sourced them. "No longer used" is not the criteria Google uses to make their software OSS.
I don't think Google generally opensources _products_ - either it always is open source (Android) or never is (web apps). I can't think of an example where a product was closed source, released as open source, and continually maintained.
Open source at Google generally takes the form of libraries rather than products. Often, that's something that an individual engineer is working on, and it's easier to open source than get the copyright reassigned (since Google by default owns any code you write). There are also libraries that are open sourced for business reasons - e.g. SDKs. You can tell the difference, because most individually-driven libraries contain the copy "Not an official Google product" in the README.
I'd say both of those are actively harmful products (like PFOS or cigarettes) that hurt Google's competition by being open sourced. Google wrecked their own productivity, the least they could do was wreck everybody else's.
They take a process a small team could complete quickly with high quality and low cost maintenance and turn it into a process a huge team completes slowly with poor quality and high maintenance cost. Google can afford this because of huge profits from their advertising monopoly that they don’t know how to spend.
Go look at the manuals for IBM's Parallel Sysplex for mainframes and compare the simplicity of that to K8S for instance.
Or for that matter look at DCOM and the family of systems which Microsoft built around it which are truly atrocious but look like a model of simplicity compared to gRPC. (At least Don Box wrote a really great book about COM that showed people in the Microsoftsphere how to write good documentation.)
Or for that matter try doing something with AWS, Azure or some off-brand cloud and Google Cloud from zero (no account) and time yourself with a stopwatch. Well, it will be a stopwatch for AWS but you will probably need a calendar for Google Cloud.
Both Amazon and Google have document writing cultures. If you want to propose an idea, you write a doc about it.
The value of a doc writing culture is that writing things down encourages rigor and thoughtfulness. Docs can be widely distributed, and you can read it, think about it, and add comments. Exchanging ideas can be asynchronous rather than meeting oriented.
But also it can all get a bit carried away (because these artifacts become an important component to promotion).
I work at Google, which is a doc culture. But I worked most of my career at startups where we never wrote anything down. Overall, I prefer doc cultures. But yes, left to it’s own devices it can seem like you are working at a doc factory.
I worked at a startup without any documentation. The cto knew everything, because he'd been in every meeting and made every decision. Everyone else became somewhat helpless due to lack of information.
Doc culture is better than the alternative. 95% of docs get thrown away because half-way through writing it you realize it cannot work. In the absence of the design doc, you will instead hand-wave your way through the design phase and realize the mistake after several people have wasted time actually implementing it. It is a very, very good thing to know whether your idea solves or does not solve the given problem, at the earliest opportunity.
There's a happy medium. Even a startup can do with some well-maintained READMEs. But the problem with startups is churn, which makes for docs that rapidly go out of date, and the only thing worse than no docs is out of date docs that lead you down the wrong path.
Sure, Fleischacker pool was underfunded for years and fell into disrepair and then closed in 1971. But why was it underfunded, and why was there not enough money to repair it?
The NYT argument is going to be that they put up a site, own the copyright for their content and make that content available for either a human to read it for themselves, or software to index for something commonly understood as a search engine. Those terms do not entitle the training of LLMs for commercial use. Therefore, cease and desist. Oh and destroy anything that was created by violating the terms of our license.
You can make arguments like a) what is ChatGPT but a different kind of search engine, or b) what is an LLM but a primitive human, or c) but but uhh we didn’t agree to these terms.
The LinkedIn case already proves that you cannot impose conditions on works you freely serve to the public. The data is there to anyone who sends a request (you don’t even need to be logged in) and if they do something you don’t like with it then oh well.
So if that’s the argument it’s already been argued by LinkedIn and lost.
This is one of those things where copyright holders have gotten absurdly full of themselves though. Like what you’ve said is that copyright holders have the right to impose a contract of adhesion on data that they are broadcasting into the public without any idea with whom they are even forming a contract, and that’s a facially absurd and incredibly noxious idea if you follow it to the conclusions it implies.
Copyright is about securing to the public works of significance and encouraging their creation and the way it’s become a lifetime-plus-75-year guarantee of intellectual ownership of ideas is fundamentally noxious and goes against the intent and spirit of the idea. And if that’s where the copyright regime is headed then I’d rather see chatGPT kill off copyright entirely.
NYT will have to prove that the derivative work is still theirs. Just violating the license may not be enough. That could be bad by itself I guess. But considering the interactive prompt can produce a wild amount of variations of 'not NYT stuff' will make it though to say what sort of damages is this.
The answer may be 'maybe'? As from what I read they basically split the decision down to 'i know it when I see it' style of ruling. If the copyright is still in effect then NYT owns that portion of the output but not others parts. As the secondary effect would be owned by the generator company (in this case OpenAI) or the person who prompted for it. If that is the case NYT would have to prove what parts (nodes? bacreferences? weights?) they own?
Terms of Use are a thing, and if the Times can prove that OpenAI infringed their web terms by scraping, they may have a case... but terms of use probably won't monetize well or give them enough leverage to prevent OpenAI from using their data anyway and may end-up distracting from the main copyright suit.
Violating TOS, at least to scrape and use later, is legal.[0] I'm not sure how the ruling interacts with LLMs, but I'm sure OpenAI's lawyers would bring it up.
Where do you see that they won the case? Can you provide a source because the wikipedia article directly contradicts what you are saying...?
I see they went to the Supreme Court who kicked it back to the Ninth who then re-affirmed their position that HiQ Labs was not in violation of the CFAA.
From [0] and [1], it seems it was a mixed ruling. I am actually not sure whether it's now legal to scrape, since the Court ruled against hiQ due to a breach of terms of service, but previously the Ninth Circuit Court affirmed its ruling against LinkedIn.
> The hiQ decisions give a green light, at least in some circumstances, to scraping publicly available websites without fear of liability under the CFAA.
So at a federal level, it seems relatively clear. The only uncertainty is on the state level.
As previously said, search engines index and provide links. I’ll add that it constitutes fair use because a search engine isn’t itself a replacement for the articles that it indexes.
But ChatGPT is actually providing an alternative that obviates the original articles themselves.
Google started moving away from just providing links along time ago. They routinely scrap data and show it, keeping people from visiting the links. I don't see how this behavior could be allowed while also crushing LLMs.
Personally, I like the flexibility of an LLM being able to describe a process at different skill levels. This is of tremendous educational value to the world.
Search engines provide links, but also titles and snippets of the page -- enough for you to decide if you want to visit, and Google will show you their cached page if you ask for it.
Even the link is a copyrightable item -- artistic effort went into creating it
Search engines will also eventually stop serving the result if the source disappears. A LLM model that has been trained and published don't care at all about the source anymore.
I remember when Gmail was new. It was way, way better and more amazing than Hotmail. The idea was a practically infinite searchable inbox. Nothing else was like it at the time.
I think it would be unfair to not give credit to Google for YouTube. YouTube was indeed a visionary idea with legs, but it is so much further developed now than in 2005. And a lot of it has to do with the way Google has nurtured it over the years.
You could also say there were digital music players before the iPod, Apple copied the Mac from Xerox, and there were smart phones before the iPhone.