Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.
> We release Vicuna weights as delta weights to comply with the LLaMA model
> license. You can add our delta to the original LLaMA weights to obtain
> the Vicuna weights.
That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?
If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model.
https://github.com/lm-sys/FastChat#vicuna-13b:
`python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`
This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?
You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b)
The delta weights are hosted on huggingface and will be automatically downloaded.
> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.
Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.
I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.
The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.
If they aren't copyrightable, couldn't they still be classes as a trade secret and still fall under IP law? Though I'm not sure if distributing the weights to people who sign a simple agreement to not redistribute would count as taking reasonable precautions in maintaining secrecy.
Is it though? It could be a child picking up a hobby after being old enough to appreciate the hobby. There is so much more time left in the universe before heat death, so the 90y metaphor doesn't really describe the current point in time
Usually, you don't know if something is "definitely" anything in the legal world unless it's been tested in court. You have any case you want to reference here? Or what makes you so certain?
on what legal theory or precedence makes this true?
IMHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable; only the layouts and expressive portion of a phone directory is copyrightable.
So to make the weights copyrightable, it needs to be argued that the 'layout' of the weight is a creative expression, rather than a 'fact'. But the weights are matrices , which is not expressive or creative. Someone else could derive this exact same set of weights from scratch via the same algorithmic procedure, and therefore, these weights cannot be a creative expression.
"Definitely" is too certain w.r.t. law, but it's pretty obvious how you'd argue these fall under copyright. The difficulty would really be the opposite, it'd be arguing the weights are not derived works of the copyrighted input data sets.
Firstly, weights are not merely a collection of facts like a telephone book is. If two companies train two LLMs they'll get different weights every time. The weights are fundamentally derived from the creative choices they make around hyperparameter selection, training data choices, algorithmic tweaks etc.
Secondly, weights can be considered software and software is copyrightable. You might consider it obvious that weights are not software, but to argue this you'd need an argument that also generalizes to other things that are commonly considered to be copyrightable like compiled binaries, application data files and so on. You'd also need to tackle the argument that weights have no value without the software that uses them (and thus are an extension of that software).
Finally, there's the practical argument. Weights should be copyrightable because they cost a lot of money to produce, society benefits from having large models exist, and this requires them to be treated as the private property of whoever creates them. This latter one should in theory more be a political matter, but copyright law is vague enough that it can come down to a social decision by judges.
> MHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable
I would contest the analogy, but even if we accept it, it's still not clear whether phone directories (or other compilation of factual data) are definitely not copyrightable. The position is clear in the US, but in the UK and presumably other jurisdictions, I wouldn't be so sure.
You could claim we're just talking about US law here, but if you release something on github/huggingface without geo-restrictions, and your company does business in Europe, you might not only have to comply with US law...
Yes, but there's less incentive for large companies to spend huge amounts of money training these systems when other companies can just take their work for free.
Removing IP protection would make it a lot easier to innovate at this level, but it would reduce the amount of money flowing into getting us to the next level.
I'm curious, do you not think this might have adverse effects? Namely, if NN weights aren't copyrightable, limited releases like Meta has done might not be possible anymore so they might just cease completely with releases, ultimately leading access to large models to be more restricted.
i honestly do not know what is worse from the three realistic alternatives:
1- to have large corporations and people with privileged access to them have these models exclusively and have them collaborate as a clique
2- to have those models openly released to everybody, or de-facto released to everybody as they leak in short order
3- to have the people who think releasing models is a bad thing simply not release them and work alone in their proprietary solutions, as the smaller companies and hobbyists do collaborate
i say let them have a go at number 3 and see how that works for them - shades of "Microsoft Network" vs Internet all over again
The llama-dl project actually helped you download the weights, whereas this just assumes you already have them. That feels like a pretty massive difference to me.
Nobody at Facebook approved it? Given the attention it has received, hard to imagine it has slipped through the cracks, but a deliberate decision to not address.