I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma mod...

		verdverm 5 days ago \| parent \| context \| favorite \| on: Gemma 4 12B: A unified, encoder-free multimodal mo... I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models This is a good starting issue with a bunch of linked/related https://github.com/ggml-org/llama.cpp/issues/22746
		help