There's also the issue in that the following two things don't have the same semantics in C:
float v = a * b + c;
vs static_inline float get_thing(float a, float b) {
return a*b;
}
float v = get_thing(a, b) + c;
This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:
void my_func(strong_type1* a, strong_type2* b);
a =/= b, and we can pull the underlying type out. However, if you write: void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
float* a = get(ap);
float* b = get(bp);
}
a could equal b. Semantically the uintptr_t version doesn't provide any aliasing semantics. Which may or may not be what you want depending on your higher level language semantics, but its worth keeping the distinction in mind because the compiler won't be able to optimise as well float v = (float) ((float) a) * ((float) b) + c;
Since v is float, the cast representing the return conversion can be omitted: float v = ((float) a) * ((float) b) + c;
Now, if a and b are already float, then it's equivalent. Otherwise not; if they are double or int, we get double or int multiplication in the original open code.Not necessarily! Floating-point contraction is allowable essentially within statements but not across them. By assigning the result of a * b into a value, you prohibit contraction from being able to contract with the addition into an FMA.
In practice, every compiler has fast-math flags which says stuff it and allows all of these optimizations to occur across statements and even across inline boundaries.
(Then there's also the issue of FLT_EVAL_METHOD, another area where what the standard says and what compilers actually do are fairly diametrically opposed.)
2 additional points,
1: The article mentions DWARF, even without it you can use #line directives to give line-numbers in your generated code (and this goes a very long way when debugging), the other part is local variables and their contents.
For variables one can get a good distance by using a C++ subset(a subset that doesn't affect compile time, so avoid any std:: namespaced includes) instead and f.ex. "root/gc/smart" ptr's,etc (depending on language semantics), since the variables will show up in a debugger when you have your #line directives (so "sane" name mangling of output variables is needed).
2: The real sore point of C as a backend is GC, the best GC's are intertwined with the regular stack-frame so normal stack-walking routines also gives everything needed for accuracte GC (required for any moving GC designs, even if more naive generation collectors are possible without it).
Now if you want accurate somewhat fast portable stack-scanning the most sane way currently is to maintain a shadow-stack, where you pass prev-frame ptrs in calls and the prev-frame ptr is a ptr to the end of a flat array that is pre-pended by a magic ptr and the previous prev-frame ptr (forming a linked list with the cost of a few writes, one extra argument with no cleanup cost).
Sadly, the performant linked shadow-stack will obfuscate all your pointers for debugging since they need to be clumped into one array instead of multiple named variables (and restricts you from on-stack complex objects).
Hopefully, one can use the new C++ reflection support for shadow-stacks without breaking compile times, but that's another story.
(though it is fine if GC can only happen inside a function call and the call takes the shadow stack as an argument)
You could put each stack frame into a struct, and have the first field be a pointer to a const static stack-map data structure or function that enumerates the pointers within the frame.
BTW, the passed pointer to this struct could also be used to implement access to the calling function's variables, for when you have nested functions and closures.
I like compiler backends, but truth be told, I grow weary of compiler backends.
I have considered generating LLVM IR but it's too quirky and unstable. Given the Virgil wasm backend already has a shadow stack, it should now be possible for me to go back to square one and generate C code, but manage roots on the stack for a precise GC.
Hmm....
I think emitting something like
#line 12 "source.wasm"
for each line of your source before the generated code for that line does something that GDB recognizes well enough.You can find all the possible tricks in making it debuggable by reading the y.tab.c
Including all the corner cases for odd compilers.
Re2c is a bit more modern if you don't need all the history of yacc.
But yes, you can put a line-oriented breakpoint on your action code and step through it.
I found this Reddit thread that gives a bit more detail:
https://www.reddit.com/r/haskell/comments/1pbbon/c_as_a_proj...
and the project link:
I'd have to do an actual project to see how annoying it is to lower semantics to unsafe Rust to know for sure, but my guess is you'd be slightly better off because you don't have to work around implicit conversions in C, the more gratuitous UBs in C, and I think I'd prefer the slightly more complete intrinsic support in Rust over C.
Not easy, but there are compilers that do it.
Lobster [0] started out with automatic reference counting. It has inferred static typing, specialising functions based on type, reminiscent of how Javascript JIT compilers do it. Then the type inference engine was expanded to also specialise functions based on ownership/borrowing type of its arguments. RC is still done for variables that don't fit into the ownership system but the executed ops overall got greatly reduced. The trade-off is increased code size.
I have read a few older papers about eliding reference counting ops which seem to be resulting in similar elisions, except that those had not been expressed in terms of ownership/borrowing.
I think newer versions of the Swift compiler too infer lifetimes to some extent.
When emitting Rust you could now also use reference counting smart pointers, even with cycle detection [1]. Personally I'm interested in how ownership information could be used to optimise tracing GC.
[0]:https://aardappel.github.io/lobster/memory_management.html
[1]:https://www.semanticscholar.org/paper/Breadth-first-Cycle-Co...
Tangentially, i was experimenting with a runtime library to expose such "borrow-first" semantics, such "lents" can be easily copied on a new thread stack to access shared memory, and are not involved in RC . Race-conditions detection helps to share memory without any explicit move to a new thread. It seems to work well for simpler data-structures like sequence/vectors/strings/dictionary, but have not figured a proper way to handle recursive/dynamic data-structures!
Snark aside, the output targets of compilers need to be unsafe languages typically, since the point of a high level compiler in general is to verify difficult proofs, then emit constructs consistent with those proof results, but simplified so that they cannot be verified anymore, but can run fast since those proofs aren't needed at runtime anymore. (Incidentally this is both a strength and weakness of C, since it provides very little ability for the compiler to do proofs, the output is generally close to the input, while other languages typically have much more useful compilers since they do much more proof work at compile time to make runtime faster, while C just makes the programmer specify exactly what must be done, and leaves the proof of correctness up to the programmer)
(My experience with "compile to C" is with cfront, the original C++ implementation that compiled to C. The generated code was just terrible to read.)
I was thinking about how to embed custom high level language into my backend application written in C++. Each individual script would compile to native shared lib loadable on demand so that the performance stays high. For this I was contemplating exactly this approach. Compile this high level custom language with very limited feature set to plain C and then have compiler that comes with Linux finish the job.
I’ve done a few such things for compiling ML models, but in the end I always regretted it.
Often times virtual functions are implemented in C to provide an interface (such as filesystem code in the Linux kernel) via function pointers—-just like C++ vtable lookups, these cannot be inlined at compile time.
What I wonder is whether code generated in C can be JIT-optimized by WASM runtimes with similar automatic inlining.
I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.
// a.c
int f( int x ) {
return x + 1;
}
// b.c
extern int f(int x );
int main() {
int y = f(41);
}
but if f() had been defined as static, you couldn't do this.Just because you can use the information you have to call a given function, doesn't mean you aren't violating an interface.