> Even then, there are situations where a portable program in a high level langu...

Annatar · on Oct 20, 2019

"(a) nop doesn't prefetch;"

I wrote nopsw and you are writing about nop. Are you doing this on purpose? nop doesn't, but only on intel family of processors, nopsw has a side-effect of prefetching.

"Yes, that's what I said. You took my original function and wrote a version that didn't generate IEEE-754 compliant results."

Turns out, so did the GCC compiler, at least the one I have, so I'd say your point is moot.

Truth of the matter is, you picked a really bad example: to solve it correctly, one would have to implement at least a portion of the algorithms in the GNU multiple precision library ("GMP"). I suspect you picking a floating point example was not by accident.

"I think sub would be incorrect if the original bit pattern in the register were a NaN."

Even NaN has to be represented by a bit pattern, so subtracting that bit pattern from the register will yield zero.

"That's the code for testing whether the for loop should ever be entered."

And here we come back to my point: if you were coding this from scratch in assembler, you wouldn't write a generic function, and you'd know that n will never be zero. And the reason why you'd never write a generic function is because they lose you speed and increase code size. But a compiler cannot know that and cannot optimize for such a situation. It's just a dumb program.

"You have made it very clear that you don't feel qualified to write highly optimized x86-64 code. Neither are you qualified to judge the quality of x86-64 code if you can't tell what is inside a loop and what isn't."

I spent 30 seconds looking at assembler code for a processor family I have never coded on. I spent less than 15 minutes writing a piece of optimized assembler code for that family and using GNU as, an assembler I never wrote code in. Now you judge me on mis-interpreting one clumsily generated compiler instruction. By which logic, considering I was able to do all of this in under 15 minutes am I not qualified? I'm very pleased with myself, for the time budget, an unknown processor and unknown assembler I think I did very well. We will have to disagree, vehemently if you please.

I stand by my assertion that a compiler will never be able to beat a human at generating fast, optimized code, nor will it ever be capable of generating smaller code. In addition I don't hold the GCC developers in high regard, considering how notoriously bad their compilers are when compared to say, intel or Sun Studio ones. Even the Microsoft compilers beat GCC in generating code which runs faster. In fact, pretty much every compiler beats GCC in performance, which means that people working on those GCC compilers aren't good enough. GCC's only undisputed strength is in the vast support of different processors. There, they are #1, but everywhere else they're last. The GCC developers just don't have what it takes to be the best in that business.

"And yet, somehow, this version of the code runs faster than your version."

I don't know that; you ran code which I wrote blindly; I was not even able to reproduce your output with my GCC. That it runs faster is just your assertion. Based on my experience, I have no reason to believe that.

"I hope you have a wonderful day."

As a matter of fact, I am about to go create a SVR4 OS package of GCC 9.2.0 which I patched and managed to bootstrap after a week worth of work on Solaris 10 on sparc, so yes I will have a wonderful day enjoying the fruits of my labors. I wish you a wonderful day as well.