The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.
Which is why the Linux kernel, among others, has to set special flags in order to get back to a somewhat sane variant.
(1) Except for those special architectures where the hardware is that weird, but in that case you typically know about it.
> The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.
Citation needed.
Signed overflow being undefined behavior was a direct consequence of different hardware representations of numbers. Inserting the instructions required to catch overflow and make it wrapping would be expensive, at least on some systems.
The fact that arrays decay to pointers, C has no dynamic array support, and so on are decisions that go way back and more or less prohibit array bounds checking. After preprocessing, inlining, and various forms of optimization it can be very far from obvious that some random NULL check after the pointer has already been dereferenced is intentional and shouldn't be removed.
Things like overlapping pointers have always had undefined behavior in the sense that inserting the instructions to check for the overlap all over the place would have a large performance impact, no one would use such a compiler, and thus there is no market for one. The standards committee largely documented existing practices rather than inventing new undefined behaviors.
I keep seeing people like you claim we should just "fix" undefined behavior or the standards committee gleefully inserted undefined behaviors because they hate programmers. If you actually sat down and disassembled the C code you write then went through the exercise of inserting instructions to eliminate most types of undefined behavior I suspect it would be a very illuminating experience. I also doubt you'd be so certain of your position.
Signed overflow was made undefined so that it could do whatever the CPU naturally did, not so that the compiler could delete huge chunks of supposedly dead code.
In the old days, it thus wasn't truly undefined. It was undefined by the language, but you could just look in the CPU documentation to see what would happen. There was some crazy behavior in the old days, but nothing you wouldn't expect from looking at the CPU documentation.
These days, nobody is shipping a C99 or newer compiler for any CPU with weird integers. Everything is twos complement, without padding or trap values. All of that "undefined" stuff should thus be entirely compatible across all modern C compilers.
> I keep seeing people like you claim we should just "fix" undefined behavior
Nope. Just don't take undefined behavior to mean "do arbitrary optimizations that dramatically alter the behavior of programs".
Let's see what C89 says about undefined behavior:
"3.4.3 Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."
Note the "permissible".
Newer versions of the standard:
"3.4.3
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."
Note that the wording has change from "permissible" to "possible". To me the older version indicates that this is the range of things the compiler is allowed to do, and that range is from "do nothing" to "behaving during translation or execution in a documented manner characteristic of the environment".
I don't think "silently do arbitrary optimizations, removing arbitrary code or let demons fly out of your nose" is in the range between "nothing" and "documented behavior characteristic of the environment". And also not between that and "terminate execution".
However, the newer version says "possible", and this to me sounds a lot less like a restriction of what the compiler is allowed to do, and much more like an illustration of things the compiler could do.
And of course that is exactly what has been happening. And it is detrimental.
See also:
"What every compiler writer should know about programmers
or
“Optimization” based on undefined behaviour hurts performance"
Oh, and I was using C compilers before there was a C standard (did get the updated when ANSI C was released). At that point all behavior was "undefined". And "unspecified". And yet compilers did not take the liberties they take today.
Your post basically boils down to "don't do any optimizations that might possibly exploit undefined behavior". That covers most optimizations. That means compilers can only use -O1. Proving you can still optimize is often equivalent to the halting problem (or you emit lots of branches and duplicate implementations for each function so the code can test the parameters and take the "safe" path in some cases or the "fast" path in others).
> silently do arbitrary optimizations
What are arbitrary optimizations to you? This isn't the magical land of unicorns here. Compilers have to be pedantic by nature so you have to sit down and come up with what is allowed and isn't allowed.
I happen to agree that some instances of undefined behavior should be defined and eliminated. For example, uninitialized values. Require the compiler to zero-initialize any storage location before use unless it can prove an unconditional write occurs to that location. This will have a small but manageable code size impact and a relatively small runtime perf impact.
> removing arbitrary code
The compiler is just a series of meaningless instructions executing on meaningless data on a dumb machine. It has absolutely no way to understand what is "arbitrary" code and what isn't. This is basically another argument for "compilers should disable 95% of all optimizations".
Firstly, "ignoring the situation completely with unpredictable results" is a pretty clear permission to do arbitrary optimizations, as long as they preserve the semantics in the case where the execution doesn't exhibit UB.
Secondly, in ISO standards Notes are by definition non-normative, hence your exegesis as it pertains to C99 and later versions is invalid anyway.
Your points are really good. It's a pity that -std=c89 doesn't act as time machine in this respect... would have been useful
https://godbolt.org/g/fKfDFq
What changes are you thinking of? The only change with respect to undefined behavior I can think of off the top of my head is punning via unions, which C99 changed from undefined behavior to well-defined behavior.
new undefined behaviors were added, and for significant bits.
I think the bigger issue is that compiler writers have taken "undefined" to mean complete liberty to do whatever the hell they want, even fairly far away from the actual location of the undefined behavior.
And it turns out that the C89 standard had a range of "permissible" actions on undefined behavior. To me "permissible" is restrictive, as in you are not allowed to do anything outside that range.
Newer standards kept that section (3.4.3) the same, word for word, except for changing "permissible" to "possible". To me, "possible" is not restrictive but illustrative.
Now you might think this is reading tea-leaves on my part, and you might be correct, but on the other hand that change of interpretation is exactly what happened. Compilers used to not do these things, and nowadays compilers do do these things, and compiler writers cite the standard as giving them permission. Also consider to what extent wording of standards is finessed and fought over, and that exactly this one word was changed, in a section that otherwise remained the same word for word.
wants to. There fixed that for you ;-)
The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.
Which is why the Linux kernel, among others, has to set special flags in order to get back to a somewhat sane variant.
(1) Except for those special architectures where the hardware is that weird, but in that case you typically know about it.