throwarchitect's comments

throwarchitect · on Dec 11, 2020

The greater simplicity of ARMv8 and its fixed sized instructions definitely helps, but also Intel runs their cores at nearly 2x higher frequency, which means a lot less logic can be squeezed into a clock cycle. That makes it much harder to to make a wider processor.

throwarchitect · on Dec 11, 2020

Guess where some of Intel's engineers have fled to? People move around, so it's not like one company has a strangle-hold on knowledge that can't be replicated by another company, especially when one of those companies is willing to pay more for talent.

throwarchitect · on Nov 30, 2020

Considering that x86 is less dense than any RISC ISA, the "compression" argument behind CISC falls apart. No surprise a denser, trivial to decode ISA does better.

wk_end · on Dec 1, 2020

You have a source for that? The first google result I found for research on that shows it as denser than almost every RISC ISA [1]. It’s just one study and it predates ARM64 fwiw though.

[1] https://www.researchgate.net/profile/Sally_McKee/publication...

throwarchitect · on Dec 1, 2020

That paper uses no actual benchmarks, but rather grabbed a single system utility and then hand-optimized it; SPEC and geekbench show x86-64 comes in well over 4 bytes on average.

wk_end · on Dec 1, 2020

Sure, I never claimed it to be the be-all-end-all, just the only real source I could find. Adding "SPEC" or "geekbench" didn't really help.

Doing a little more digging, I have also found this [1], which claims "the results show that the average instruction length is about 2 to 3 bytes". On the other hand, this [2] finds that the average instruction length is 4.25 bytes.

Bytes per instruction doesn't really say anything useful for code density when talking about RISC vs. CISC though, since (arguably) the whole idea is that individual CISC instructions are supposed to do more than individual RISC instructions. A three instruction CISC routine at five bytes each is still a win over a four instruction RISC routine at four bytes each. Overall code size is what actually matters.

[1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40...

[2] http://www.cs.unc.edu/~porter/pubs/instrpop-systor19.pdf

monocasa · on Dec 1, 2020

But there's more work being done per average x86_64 instruction due to RMW ops. Hence why they just look at an entire binary.

throwarchitect · on Nov 30, 2020

> the M1 is fast isn’t due to technical tricks, but due to Apple throwing a lot of hardware at the problem.

Apple threw more hardware at the problem and they lowered the frequency.

By lowering the frequency relative to AMD/Intel parts, they get two great advantages. 1) they use significantly less power and 2) they can do more work per cycle, making use of all of that extra hardware.