Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.

  > We release Vicuna weights as delta weights to comply with the LLaMA model
  > license. You can add our delta to the original LLaMA weights to obtain
  > the Vicuna weights.
Edit: took me a while to find it, here's a direct link to the delta weights: https://huggingface.co/lmsys/vicuna-13b-delta-v0


That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?


If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model. https://github.com/lm-sys/FastChat#vicuna-13b: `python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`


This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?


I may have missed the detail, but it also expects the pytorch conversion rather than original LLaMa model.


Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...


You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.


Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)


I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```


> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.


Just rename it in the tokenconfig.json


Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows


so an extra licensing issue to get around the original non commercial license... this is just a research curiosity is it not?


Seems that way, it would probably be a bad idea to use this for anything commercial at the very least.


Vicuna at huggingface.com? This keeps making me think of "facehuggers" from Aliens and Vecna from Stranger Things.

(I know a vicuna is a llama like animal.)


Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.


(I work on llama-dl.)

We’re fighting back against the DMCA requests on the basis that NN weights aren’t copyrightable. This thread has details: https://news.ycombinator.com/item?id=35393782

I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.

The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.

I’ll keep everyone updated. For now, hack freely.


If they aren't copyrightable, couldn't they still be classes as a trade secret and still fall under IP law? Though I'm not sure if distributing the weights to people who sign a simple agreement to not redistribute would count as taking reasonable precautions in maintaining secrecy.


If facebook freely distributed their trade secrets, I'm not sure they'd have any legal defense.


I'm sure they wouldn't have any legal recourse on the trade secrets front if they distributed them to anyone who asked...


keep up god's work!


> god's work

creating sentient life?


That can't be his work, since he only picked up that hobby about 0.000625% of the universe's timespan ago.


For many humans, some "hobbies" involve "projects" which may involve seemingly infinite degrees of procrastination. (This certainly applies to me!)


You're not wrong - but for perspective this is equivalent to a 90 year old picking up a hobby 5 hours ago.


Is it though? It could be a child picking up a hobby after being old enough to appreciate the hobby. There is so much more time left in the universe before heat death, so the 90y metaphor doesn't really describe the current point in time


Gotta do something in your old age. Better than crossword puzzles, I'll bet.


Lemme save whoever is donating the legal here the time: model weights are definitely copyrightable.


Usually, you don't know if something is "definitely" anything in the legal world unless it's been tested in court. You have any case you want to reference here? Or what makes you so certain?


> model weights are definitely copyrightable.

on what legal theory or precedence makes this true?

IMHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable; only the layouts and expressive portion of a phone directory is copyrightable.

So to make the weights copyrightable, it needs to be argued that the 'layout' of the weight is a creative expression, rather than a 'fact'. But the weights are matrices , which is not expressive or creative. Someone else could derive this exact same set of weights from scratch via the same algorithmic procedure, and therefore, these weights cannot be a creative expression.


"Definitely" is too certain w.r.t. law, but it's pretty obvious how you'd argue these fall under copyright. The difficulty would really be the opposite, it'd be arguing the weights are not derived works of the copyrighted input data sets.

Firstly, weights are not merely a collection of facts like a telephone book is. If two companies train two LLMs they'll get different weights every time. The weights are fundamentally derived from the creative choices they make around hyperparameter selection, training data choices, algorithmic tweaks etc.

Secondly, weights can be considered software and software is copyrightable. You might consider it obvious that weights are not software, but to argue this you'd need an argument that also generalizes to other things that are commonly considered to be copyrightable like compiled binaries, application data files and so on. You'd also need to tackle the argument that weights have no value without the software that uses them (and thus are an extension of that software).

Finally, there's the practical argument. Weights should be copyrightable because they cost a lot of money to produce, society benefits from having large models exist, and this requires them to be treated as the private property of whoever creates them. This latter one should in theory more be a political matter, but copyright law is vague enough that it can come down to a social decision by judges.


I agree but I'd suggest that weights are less like the telephone numbers in a directory and much more like the proportional weights in a recipe.

Recipes, famously, are almost but not quite copyrightable | patentable.

eg:

https://copyrightalliance.org/are-recipes-cookbooks-protecte...

https://etheringtons.com.au/are-recipes-protected-by-copyrig...


> MHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable

I would contest the analogy, but even if we accept it, it's still not clear whether phone directories (or other compilation of factual data) are definitely not copyrightable. The position is clear in the US, but in the UK and presumably other jurisdictions, I wouldn't be so sure.

You could claim we're just talking about US law here, but if you release something on github/huggingface without geo-restrictions, and your company does business in Europe, you might not only have to comply with US law...

eg. https://www.jstor.org/stable/24866738 , eg. https://books.google.com.hk/books?id=wHJBemWuPT4C&pg=PA114&l...


Ok. What if I train it for one micro step?


thanks zero comment bot account!


If NN weights aren't protected by IP law that could slow down progress quite a lot. That could be very good for people worried about alignment.


>If NN weights aren't protected by IP law that could slow down progress quite a lot.

What do you mean? IP law is overwhelmingly an impediment to progress; innovation happens faster when people are free to build on existing weights.


Yes, but there's less incentive for large companies to spend huge amounts of money training these systems when other companies can just take their work for free.

Removing IP protection would make it a lot easier to innovate at this level, but it would reduce the amount of money flowing into getting us to the next level.


Or development could shift out of the hands of these large corporations, which might be a good thing.

Somehow, though, I doubt they'll let the golden goose slip through their fingers, no matter what happens.


Not really. This model only made it to the public because meta was offering it publicly.

This won't happen to GPT any time soon so they are safe, copyright or not.


I'm curious, do you not think this might have adverse effects? Namely, if NN weights aren't copyrightable, limited releases like Meta has done might not be possible anymore so they might just cease completely with releases, ultimately leading access to large models to be more restricted.


I think we already live in that era, unfortunately. Meta's model release is probably going to be the largest for some years.

There's more detail about the upsides/downsides in this thread: https://twitter.com/theshawwn/status/1641804013791215619


i honestly do not know what is worse from the three realistic alternatives:

1- to have large corporations and people with privileged access to them have these models exclusively and have them collaborate as a clique

2- to have those models openly released to everybody, or de-facto released to everybody as they leak in short order

3- to have the people who think releasing models is a bad thing simply not release them and work alone in their proprietary solutions, as the smaller companies and hobbyists do collaborate

i say let them have a go at number 3 and see how that works for them - shades of "Microsoft Network" vs Internet all over again


The llama-dl project actually helped you download the weights, whereas this just assumes you already have them. That feels like a pretty massive difference to me.


It's fairly similar to a ROM patch in the video game space, which has mostly stood the test of time.


With a ROM, you could at least make a claim that it was your backup copy. I have no such claims to Facebook’s model.


Researchers unaffiliated with Facebook are allowed to possess and use the original weights though, and they can make use of these weights.


like that but requiring 60GBs of CPU RAM for some reason :-P

one has to wonder how did they implement the storage of those deltas to require that sort of RAM


For perspective, that's about $200-$250 of RAM on a desktop computer. They might just not have cared.

Though I expect somebody to write a patch to make this more accessible to people on laptops.



Nobody at Facebook approved it? Given the attention it has received, hard to imagine it has slipped through the cracks, but a deliberate decision to not address.


Very unlikely you'd face any legal action for usage of anything. If you share it, then it becomes less unlikely.

Edit: Also, judging by a comment from the team in the GitHub repository (https://github.com/lm-sys/FastChat/issues/86#issuecomment-14...), they seem to at least hint about been in contact with the llama team.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: