Show HN: VLM Inference Engine in Rust | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Show HN: VLM Inference Engine in Rust (mixpeek.com)
		1 point by Beefin 3 months ago \| hide \| past \| favorite \| 1 comment

storystarling 3 months ago [–]

What hardware are you running this on to get 2-3s latency? A 14GB model plus KV cache seems like it would require a 24GB card (3090/4090) to avoid swapping. I've found that once you spill over to system RAM on consumer gear the performance usually falls off a cliff.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact