Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One interesting option for these big memory scans on x86 and ARM CPUs is using the non-temporal load/store instructions. Those actually bypass caching* and may help with the cache pressure of LLM workloads that just do scans. The lookup table is still probably the wrong solution even with this sort of thing.

* Not quite all of it - There are still buffers to do write combining and some read caching on scans.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: