I've been wondering for a long time about how ECS can be cache friendly.
If you look at Overwatch's ECS ( https://youtu.be/W3aieHjyNvw?t=326 ), there are many systems that can read from 10+ different components.
For every entity in this system, at least 10 reads with no locality whatsoever are done.
That seems crazy slow to me.
The point of data oriented patterns is to structure data how it is used. So if these lookups were bottlenecks you'd collapse components together if they were always used together.
The other idea is that where there is one access pattern there will be more of the same. So by updating by System you keep as much as possible hot in cache by doing all the similar lookups together. So whilst the first lookup might have ten complete misses the next one probably won't.
At a gameplay level there's a lot of chaos going on and entities are not generally doing things that are easy to organize in a way that avoids cache misses anyway. On top of which you are juggling ease of change with optimization. At which point you just need to be fast enough. I'd also bet most bottlenecks for Overwatch were not in gameplay code.
Mostly though people should be thinking in a data oriented way rather than grabbing an ECS framework and expecting that to magically make things cache efficient.
I saw the video, but they don't go much in details on the actual implementation, so it's hard to say where, how and if things are optimized or not. Moreover, the speaker stresses also on the fact that ECS is used for code organization in most cases and they benefit a lot on this aspect.
Consider instruction cache locality as well. That system will run sequentially running the same code for all entities that need it, and is likely to all stay cached.
Whereas if you tick each entity separately and run all the logic, each new entity tick is following on to so much unique code having run that it is probably starting all over on uncached instruction fetches.
It's ten reads, but each of the ten reads is typically just an index into a big sequential array. Also, when you're reading components, you can run that system in parallel with any other systems that do not write to those components.
If you look at Overwatch's ECS ( https://youtu.be/W3aieHjyNvw?t=326 ), there are many systems that can read from 10+ different components. For every entity in this system, at least 10 reads with no locality whatsoever are done. That seems crazy slow to me.