As an example, consider this code (godbolt: https://godbolt.org/z/TrMrYTKG9):
struct foo {
unsigned char a, b;
};
foo make(int x) {
foo result;
if (x) {
result.a = 13;
} else {
result.b = 37;
}
return result;
}
At high enough optimization levels, the function compiles to “mov eax, 9485; ret”, which sets both a=13 and b=37 without testing the condition at all - as if both branches of the test were executed. This is perfectly reasonable because the lack of initialization means the values could already have been set that way (even if unlikely), so the compiler just goes ahead and sets them that way. It’s faster!"But it's right there in the name!" Undefined behavior literally places no restrictions on the code generated or the behavior of the program. And the compiler is under no obligation to help you debug your (admittedly buggy) program. It can literally delete your program and replace it with something else that it likes.
[1] https://kristerw.blogspot.com/2017/09/why-undefined-behavior...
Good example of why uninitialized variables are not intuitive.
This is because the code is executed symbolically during optimization. It's not running on your real CPU. It's first "run" on a simulation of an abstract machine from the C spec, which doesn't have registers or even real stack to hold an actual garbage value, but it does have magic memory where bits can be set to 0, 1, or this-can-never-ever-happen.
Optimization passes ask questions like "is x unused? (so I can skip saving its register)" or "is x always equal to y? (so I can stop storing it separately)" or "is this condition using x always true? (so that I can remove the else branch)". When using the value is an undefined behavior, there's no requirement for these answers to be consistent or even correct, so the optimizer rolls with whatever seems cheapest/easiest.
With Optimizing settings on, the compiler should immediately treat unused variables as errors by default.
The compiler sees that foo can only be assigned in one place (that isn't called locally, but could called from other object files linked into the program) and its address never escapes. Since dereferencing a null pointer is UB, it can legally assume that `*foo` is always 42 and optimizes out the variable entirely.
Compilers can do whatever they want when they see UB, and accessing an unassigned and unassiganble (file-local) variable is UB, therefore the compiler can just decide that *foo is in fact always 42, or never 42, or sometimes 42, and all would be just as valid options for the compiler.
(I know I'm just restating the parent comment, but I had to think it through several times before understanding it myself, even after reading that.)
That's not exactly correct. It's not that the compiler sees that there's UB and decides to do something arbitrary: it's that it sees that there's exactly one way for UB to not be triggered and so it's assuming that that's happening.
The way they work things out is to assume no UB happens (because otherwise your program is invalid and you would not request compiling an invalid program would you) then work from there.
I’ve never understood this behaviour from clang. At least stick a trap at the end so the program aborts instead of just executing random instructions?
The x and y values are funny too, because clang doesn’t even bother loading anything into esi for operator<<(unsigned int), so you get whatever the previous call left behind in that register. This means there’s no x or y variable at all, even though they’re nominally being “printed out”.
It can just leave the result totally uninitialised. That's because both code paths have undefined behaviour: whichever of result.x or result.y is not set is still copied at "return result" which is undefined behaviour, so the overall function has undefined behaviour either way.
It could even just replace the function body with abort(), or omit the implementation entirely (even the ret instruction, allowing execution to just fall through to whatever memory happens to follow). Whether any computer does that in practice is another matter.
That is incorrect, per the resolution of DR222 (partially initialized structures) at WG14:
> This DR asks the question of whether or not struct assignment is well defined when the source of the assignment is a struct, some of whose members have not been given a value. There was consensus that this should be well defined because of common usage, including the standard-specified structure struct tm.
As long as the caller doesn't read an uninitialised member, it's completely fine.
The code says that if x is true then a=13 and if it is false than b=37.
This is the case. Its just that a=13 even if x is false. A thing that the code had nothing to say about, and so the compiler is free to do.
Practically speaking, I’d argue that a compiler assuming uninitialized stack or heap memory is always equal to some arbitrary convenient constant is obviously incorrect, actively harmful, and benefits no one.
I take issue with the compiler assuming anything about the contents of that memory; it should be a black box.
The memory being uninitialised means reading it is illegal for the writer of the program. The compiler can write to it if that suits it, the program can’t see the difference without UB.
In fact the compiler can also read from it, because it knows that it has in fact initialised that memory. And the compiler is not writing a C program and is thus not bound by the strictures of the C abstract machine anyway.
> The user didn’t initialize this integer. Let’s assume it’s always 4 since that helps us optimize this division over here into a shift…
This is convenient for who exactly? Why not just treat it as a black box memory load and not do further “optimizations”?
Nobody’s stopping you from using non-optimising compilers, regardless of the strawmen you assert.
There’s a million more sensible things that the compiler could do here besides the hilariously bad codegen you see in the grandparent and sibling comments.
All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that. I’m saying a spec that incentivizes this nonsense is poorly designed.
Same for b. If x is true, b could be 37 no matter how unlikely that is.
succeeded = true; error = true; //This makes no sense
succeeded = false; error = false; //This makes no sense
Otherwise if I'm checking a response, I am generally going to check just "succeeded" or "error" and miss one of the two above states that "shouldn't happen", or if I check both it's both a lot of awkward extra code and I'm left with trying to output an error for a state that again makes no sense.
Then the obvious question why do we need _succeeded_ at all, if we can always check for _error_. Sometimes it can be useful, when the server doesn't know itself if the operation is succeeded (e.g. an IO/database operation timed out), so it might be succeeded, but should also show an error message to user.
Another possibility if the succeeded is not a bool, but, say, "succeeded_at" timestamp. In general, I noticed that almost always any boolean value in database can be replaced with a timestamp or an error code.
1 - In C++, a struct is no different than a class
other than a default scope of public instead of
private.
2 - The use of braces for property initialization
in a constructor is malformed C++.
3 - C++ is not C, as the author eventually concedes:
At this point, my C developer spider senses are tingling:
is Response response; the culprit? It has to be, right? In
C, that's clear undefined behavior to read fields from
response: The C struct is not initialized.
In short, if the author employed C++ instead of trying to use C techniques, all they would have needed is a zero cost constructor definition such as: inline Response () : error (false), succeeded (false)
{
;
}Compiler was changed to allocate storage for any referenced varibles.
The original code defined a struct with two bools that were not initialized. Therefore, when you instantiate one, the initial values of the two bools could be anything. In particular, they could be both true.
This is a bit like defining a local int and getting surprised that its initial value is not always zero. (Even if the compiler did nothing funny with UB, its initial value could be anything.)
Then reading from that struct like in OP constitutes UB.
Perhaps what you mean is, "Nothing is to be gained by relying on the language spec to initialize things to zero, and a lot is lost"; I'd agree with that.
Read a complex enough project that's meant to be used across compiler venrdos and versions, and you'll find plenty of instances where they're working around the compiler not implementing the standard.
Also, if you attended the standards committee, you would hear plenty of complaints from compiler vendors that certain things are implementable. Sometimes the committee listens and makes changes, other times they put their fingers in their ears and ignore reality.
There are also plenty of places where the standard lets the compiler make it's own decision (implementation defined behavior). You need to know what your compiler vendor(s) chose to do.
tl;dr: With a standard as complex as C++'s, the compilers very much do not just "implement the standard". Sometimes you can get away with pretending that, but others very much not.
I think a sanitizer probably would have caught this, but IMHO this is the language's fault.
Hopefully future versions of C++ will mandate default initialization for all cases that are UB today and we can be free of this class of bug.
Even if the implementation specified that the data would be indeterminate depending on what existed in that memory location previously, the bug would still exist.
Even if you hand-coded this in assembly, the bug would still exist.
The essence of the bug is uninitialized data being garbage. That's always gonna be a latent bug, regardless of whether the behavior is defined in an ISO standard.
That said, we all learn this one! I spent like two weeks debugging a super rare desync bug in a multiplayer game with a P2P lockstep synchronous architecture.
Suffice to say I am now a zealot about providing default values all the time. Thankfully it’s a lot easier since C++11 came out and lets you define default values at the declaration site!
You don't want to zero out the memory? Slap a "foo = uninitialized" in there to have that exact behavior and get the here be demons sign for free.
Uninitialized state is totally fine as an opt-in performance optimization. But having a well defined non-garbage default value should obviously be the default.
Did C fuck that up 50 years ago? Yeah probably. They should have known better even then. But that’s ok. It’s a historical artifact. All languages are full of them. We learn and improve!
If uninitialization was opt-in you would still be free to "assume uninitialized until proven otherwise". But uninitialized memory is such a monumental catastrophic footgun that really is not a justifiable reason to make that default behavior. Which, again, is why no modern languages make that (terrible) design choice.
That purpose would be better served by reclassifying uninitialized reads as erroneous behavior, which they are for C++26 onwards. What useful purpose is served by having them be UB specifically?
Plenty of things are UB just because major implementations do things wildly differently. For example:
realloc(p, 0)
Having initialization be UB means that implementations where it's zero cost can initialize them to zero, or implementations designed for safety-critical systems can initialize them to zero, or what have you, without the standard forcing all implementations to do so.So these variables will be more or less what the current "defanged" Rust std::mem::uninitialized() function gets you. A bit slower than "truly" uninitialized variables, but not instant death in most cases if you made a mistake because you're human.
Those C++ people who feel they actually need uninitialized variables can tell the compiler explicitly [for that particular variable] in C++ 26 that they opt out of this safeguard. They get the same behaviour you've seen described in this thread today, arbitrary Undefined Behaviour if you read the uninitialized variable. This would be similar to modern Rust's MaybeUninit::uninit().assume_init() - you are explicitly telling the compiler it's OK to set fire to everything, you should probably not do this, but we did warn you.
And by convention, all classes derived from CBase would start their name with C, so something like CHash or CRectangle.
Also, how does CBase knows the size of its allocated memory?
https://github.com/SymbianSource/oss.FCL.sf.os.kernelhwsrv/b...
2. Initialisation of the CBase derived object to binary zeroes through a specific CBase::operator new() - this means that members, whose initial value should be zero, do not have to be initialised in the constructor. This allows safe destruction of a partially-constructed object.