Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

After primarily using AVX2, I don't think masked instructions and scatter/gather are particularly useful. Emulating masked computations with a blend is cheap. Emulating compress and some missing shuffles is expensive. Masked stores and loads don't really help with anything except for an edge case where they don't cause page faults on the part that was masked out.


On the gpu a masked out load is a nop. It certainly is better. And scatter functionality is probably quite painful to emulate without the intrinsics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: