Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models

This is a good starting issue with a bunch of linked/related

https://github.com/ggml-org/llama.cpp/issues/22746

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: