Hacker Newsnew | past | comments | ask | show | jobs | submit | chaoyu's commentslogin

onnx is not a good option for LLM type of autoregressive generation


In my opinion, Automatic1111 is more of a development tool to experiment with different pipelines on your local GPU, or an internal tool serving a few users.

OneDiffusion project aims to solve a very different problem, which is bringing a SD pipeline to serve production traffic and scale in the cloud. For example, configuring a large diffusion model to run on multi-GPU: https://huggingface.co/blog/deploy-deepfloydif-using-bentoml


Check out BentoML, which is the underlying serving framework used by OpenLLM, and it supports other type of models and modality such as images and videos.


OpenLLM itself is under Apache 2 license, which does NOT restrict commercial use. However, OpenLLM as a framework can be extended to support other LLMs which may come with additional restrictions.


OpenLLM plan to provide an OpenAI-compatible API, which allows you to even use OpenAI's python client to talk to OpenLLM, user just need to change to Base URL to point to your OpenLLM server. This feature is working-in-progress.


Fine-tuning is coming up in the next release!

You can actually try it out on the main branch :P


Looking forward to it!

OpenLLM is adding a OpenAI-compatible API layer, which will make it even easier to migrate LLM apps built around OpenAI's API spec. Feel free to join our Discord community and discuss more!


Smaller models are likely more efficient to run inference and doesn't necessarily need the latest GPU. Larger language model trend to have better performance over more different type of tasks. But for a specific enterprise use case, either distilling a large model or use large model to help with training a smaller model can be quite helpful in getting things to production - where you may need cost-efficiency and lower latency.


The OpenLLM team is actively exploring those techniques for streamlining the fine-tuning process and making it accessible!


OpenLLM in comparison focuses more on building LLM apps for production. For example, the integration with LangChain + BentoML makes it easy to run multiple LLMs in parallel across multiple GPUs/Nodes, or chain LLMs with other type of AI/ML models, and deploy the entire pipeline on Kubernete (via Yatai or BentoCloud).

Disclaimer: I helped build BentoML and OpenLLM.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: