Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: VLM Inference Engine in Rust (mixpeek.com)
1 point by Beefin 3 months ago | hide | past | favorite | 1 comment


What hardware are you running this on to get 2-3s latency? A 14GB model plus KV cache seems like it would require a 24GB card (3090/4090) to avoid swapping. I've found that once you spill over to system RAM on consumer gear the performance usually falls off a cliff.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: