More

chaoyu · on June 20, 2024

onnx is not a good option for LLM type of autoregressive generation

chaoyu · on Aug 23, 2023

In my opinion, Automatic1111 is more of a development tool to experiment with different pipelines on your local GPU, or an internal tool serving a few users.

OneDiffusion project aims to solve a very different problem, which is bringing a SD pipeline to serve production traffic and scale in the cloud. For example, configuring a large diffusion model to run on multi-GPU: https://huggingface.co/blog/deploy-deepfloydif-using-bentoml

chaoyu · on June 19, 2023

Check out BentoML, which is the underlying serving framework used by OpenLLM, and it supports other type of models and modality such as images and videos.

chaoyu · on June 19, 2023

OpenLLM itself is under Apache 2 license, which does NOT restrict commercial use. However, OpenLLM as a framework can be extended to support other LLMs which may come with additional restrictions.

chaoyu · on June 19, 2023

OpenLLM plan to provide an OpenAI-compatible API, which allows you to even use OpenAI's python client to talk to OpenLLM, user just need to change to Base URL to point to your OpenLLM server. This feature is working-in-progress.

chaoyu · on June 19, 2023

Fine-tuning is coming up in the next release!

You can actually try it out on the main branch :P

chaoyu · on June 19, 2023

Looking forward to it!

OpenLLM is adding a OpenAI-compatible API layer, which will make it even easier to migrate LLM apps built around OpenAI's API spec. Feel free to join our Discord community and discuss more!

chaoyu · on June 19, 2023

Smaller models are likely more efficient to run inference and doesn't necessarily need the latest GPU. Larger language model trend to have better performance over more different type of tasks. But for a specific enterprise use case, either distilling a large model or use large model to help with training a smaller model can be quite helpful in getting things to production - where you may need cost-efficiency and lower latency.

chaoyu · on June 19, 2023

The OpenLLM team is actively exploring those techniques for streamlining the fine-tuning process and making it accessible!

chaoyu · on June 19, 2023

OpenLLM in comparison focuses more on building LLM apps for production. For example, the integration with LangChain + BentoML makes it easy to run multiple LLMs in parallel across multiple GPUs/Nodes, or chain LLMs with other type of AI/ML models, and deploy the entire pipeline on Kubernete (via Yatai or BentoCloud).

Disclaimer: I helped build BentoML and OpenLLM.