There are some benchmark games that I relied on in the past as a quick check and saw it as underwelming vs rust/c++.
For example:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
We see that the fastest C# version is 6 times slower than the rust/c++ implementation.
But that's super deceiving because those versions use arena allocators. Doing the same (wrote this morning, actually) yielded a ~20% difference vs the fastest rust implementation.
This was with dotnet 9.
I think the model of using GC by default and managing the memory when it's important is the sanest approach. Requiring everything to be manually managed seems like a waste of time. C# is perfect for managing memory when you need it only.
I like rust syntactically. I think C# is too object-oriented. But with a very solid standard lib, practical design, good tools, and speed when you need it, C# remains super underrated.
> The fact that Rico Mariani was able to do a literal translation of the original C++ version into C# and blow the socks off it is a testament to the power and performance of managed code. It took me several days of painful optimization to catch up, including one optimization that introduced a bug, and then Rico simply had to do a little tweaking with one hand tied behind his back to regain the lead. Sure, I eventually won but look at the cost of that victory
Which just reminded that yeah, all the links I'd made to Raymond Chen's "The poor man's way of identifying memory leaks" no longer work. The Rust implementation is less than four years old, but its link (which worked) now does not. -sigh-
Tempting to go reconstruct that performance improvement "fight" in Rust too. Maybe another day.
It more or less tells you to unlearn all functional and OOP patterns for code that needs to be fast. Just use regular loops, structs and mutable variables.
Try looking at the "transliterated line-by-line literal" programs:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
> But that's super deceiving because …
… that's labelled [ Contentious. Different approaches. ]
Try removing line 11 from
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Imagine if we had something with rust syntax but csharp’s support and memory management trade off with escape hatch
It keeps the unsung benefit of garbage collection for "programming in the large" in which memory allocation is treated as a global concern independent of everything else instead of a global concern that has to be managed locally in every line of code.
Rust's strategy is problematic for code reuse just as C/C++'s strategy is problematic. Without garbage collection a library has to know how it fits into the memory allocation strategies of the application as a whole. In general a library doesn't know if the application still needs a buffer and the application doesn't know if the library needs it, but... the garbage collector does.
Sure you can "RC all the things" but then you might as well have a garbage collector.
In the Java world we are still waiting for
e.g. you can run curl, and go fetch https://example.com/ but using this ersatz openssl rather than your C openssl implementation. That's a thing which works today, albeit not a supported configuration from the point of view of Curl's author.
with the Python FFA so you could run things like numpy inside of it. Would be transformative for my side projects.
On the other side however if you don't write code that the borrow checker would accept you likely won't get these optimizations. And even if it was accepted there's a chance the analysis needs to be too deep or complex for the escape analysis to work. Ultimately this is a nice speed up in practice but not something I would rely on.
Early this year my son was playing chess which motivated me to write a chess program. I wrote one in Python really quickly that could challenge him. I was thinking of writing one I could bring to the chess club which would have had to have respected time control and with threads in Java this would have been a lot easier. I was able to get the inner loop to generate very little garbage in terms of move generation and search and evaluation with code that was only slightly stilted. To get decent play though I would have needed transposition table that were horribly slow using normal Java data structures but it could have been done off heap with something that would have looked nice on the outside but done it the way C would have done it in the inside.
I gave up because my son gave up on chess and started building and playing guitars in all his spare time.
Chess is a nice case of specialized programming where speed matters and it is branchy and not numeric.
So far my collab with him has been over physics and electronics such as scaling for the prototype electric octobase (one string so far) that he made. His current stretch project is: he cut a MIDI keyboard in half and stacked the segments on top of each other like an organ keyboard and he's attaching that to two electric guitar necks. Probably the MIDI is going to go to a rack-mount MIDI synth so it's going to have one hell of a wire bundle coming out of it.
Personally I am really fascinated with
https://en.wikipedia.org/wiki/Guitar_synthesizer
which can take the sound from an electric guitar or bass and turn it into MIDI events or the equivalent that controls a synthesizer. Most commercial versions have six independent pickups, they used to be connected to the brains with a ribbon cable but some of them now digitize on the guitar and send the data to the controller over a serial link
https://www.boss.info/us/categories/guitar_synthesizers/seri...
Also, the more I work with this stuff the more I think trying to avoid memory management is foolish. You end up having to think about it, even at the highest of levels like a React app. It takes some experience, but I’d rather just manage the memory myself and confront the issue from the start. It’s slower at first, but leads to better designs. And it’s simpler, you just have to do more work upfront.
Edit:
> Rust's strategy is problematic for code reuse just as C/C++'s strategy is problematic. Without garbage collection a library has to know how it fits into the memory allocation strategies of the application as a whole. In general a library doesn't know if the application still needs a buffer and the application doesn't know if the library needs it, but... the garbage collector does.
Should have noted that Zig solves this by making the convention be to pass an allocator in to any function that allocates. So the boundaries/responsibilities become very clear.
I’m advocating learning about, and understanding a couple different allocation strategies and simplifying everything by doing away with the GC and minimizing the abstractions you need.
My guess is this stuff used to be harder, but it’s now much easier with the languages and knowledge we have available. Even for application development.
See https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...
Generational tracing garbage collectors automatically work in a manner similar to arenas (sometimes worse; sometimes better) in the young-gen, but they also automatically promote the non-arena-friendly objects to the old-gen. Modern GCs - which are constantly evolving at a pretty fast pace - use algorithms that reprensent a lot of expertise gathered in the memory management space that's hard to beat unless arenas fully solve your needs.
Reasoning about performance is hard as it is, given nondeterministic optimisations by the CPU. Furthermore, a program that's optimal for one implementation of an Aarch64 architecture can be far from optimal for a different implementation of the same architecture. Because of that, reasoning deeply about micro-optimisations can be counterproductive, as your analysis today could be outdated tomorrow (or on a different vendor's chip). Full low-level control is helpful when you have full knowledge of the exact environment, including hardware details, and may be harmful otherwise.
What is meant by "performance" is also subjective. Improving average performance and improving worst-case performance are not the same thing. Also, improving the performance of the most efficient program possible and improving the performance of the program you are likely to write given your budget aren't the same thing.
For example, it may be the case that using a low-level language would yield a faster program given virtually unlimited resources, yet a higher-level language with less deterministic optimisation would yield a faster program if you have a more limited budget. Put another way, it may be cheaper to get to 100% of the maximal possible performance in language A, but cheaper to get to 97% with language B. If you don't need more than 97%, language B is the "faster language" from your perspective, as the programs you can actually afford to write will be faster.
> Also, the more I work with this stuff the more I think trying to avoid memory management is foolish.
It's not about avoiding thinking about memory management but about finding good memory management algorithms for your target definition of "good". Tracing garbage collectors offer a set of very attractive algorithms that aren't always easy to match (when it comes to throughput, at least, and in some situations even latency) and offer a knowb that allows you to trade footprint for speed. More manual memory management, as well as refcounting collectors often tend to miss the sweet spot, as they have a tendency for optimising for footprint over throughput. See this great talk about the RAM/CPU tradeoff - https://youtu.be/mLNFVNXbw7I from this year's ISMM (International Symposium on Memory Management); it focuses on tracing collectors, but the point applies to all memory management solutions.
> Should have noted that Zig solves this by making the convention be to pass an allocator in to any function that allocates. So the boundaries/responsibilities become very clear.
Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort.
> Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort.
For me using them has been very easy/convenient. My earlier attempts with Zig used alloc/defer free everywhere and it required a lot of thought to not make mistakes. But on my latest project I'm using arenas and it's much more straightforward.
In lots of common cases, arenas work great; in lots of common cases they don't.
There are also other advantages unrelated to memory management. In this talk by Andrew Kelley (https://youtu.be/f30PceqQWko) he shows how Zig, despite its truly spectacular partial evaluation, still runs into an abstraction/performance tradeoff (when he talks about what should go "above" or "below" the vtable). When you have a really good JIT, as Java does, this tradeoff is gone (instead, you trade off warmup time) as the "runtime knowns" are known at compile time (since compilation is done at runtime).
When you have a really good JIT, as Java does, this tradeoff is gone
Is there a way to visualize the machine code generated by the JVM when optimizing the same kind of code as the examples shown in the talk you mention? I tried putting the following into godbolt.org, but i'm not sure I'm doing it right: public class DontForgetToFlush {
public static void example(java.io.BufferedWriter w) throws java.io.IOException {
w.write("a");
w.write("b");
w.write("c");
w.write("d");
w.write("e");
w.write("f");
w.write("g");
w.flush();
}
public static void main(String... args) throws java.io.IOException {
var os = new java.io.OutputStreamWriter(System.out);
var writer = new java.io.BufferedWriter(os, 100);
example(writer);
}
}
> Also, the more I work with this stuff the more I think trying to avoid memory management is foolish ... It takes some experience, but I’d rather just manage the memory myself and confront the issue from the start.
Not sure why you're getting downvoted, this is a reasonable take on the matter.
Languages like C++ give you a tonne of options here, from passing in scratch buffers to libraries, passing in reusable containers, move semantics, to type erased primitives like std::memory_resource and std::shared_ptr
This is something you should think about early on in your design.
Let's pull out an easy one, you mention the move assignment semantic. In C++ that's a performance leak because it isn't the destructive move - so each such move incurs a creation whether you wanted one or not and it may also incur a "moved-from" check in the destructor, another overhead you wouldn't pay with the destructive move.
This is only being compared to last year's v9, but if you compare against v7 from a couple of years ago, the changes are huge.
And this only reflects the changes to the underlying framework compilation, and doesn't factor in changes to say the Kestrel web server and static asset delivery that have taken a ton of load away.
Intel are also regularly checking in changes before they release new CPUs now so that the framework is ahead of their releases and takes advantage of new features.
The good thing is that the older techniques to minimize allocations still work, and tools like refs and ref structs make it easier to write zero-allocation code than it used to be.
But it's definitely harder to reason about whether optimizations will 'light up' for code that uses heap-allocated classes, closures, etc. than it was in the past, even if it's harder for a nice reason.
BenchmarkDotNet is a fantastic piece of tech at least, I have found it easier to benchmark in C# than in most other ecosystems I work with.
It is far more guarenteed that that will work in all circumstances than these JIT optimizations, which could have some edge cases where they won't function as expected. If stopwatch allocations were a major concern (as opposed to just feeling like a possible perf bottleneck) then a modern ValueStopwatch struct that consists of two longs (accumulatedDuration, and startTimestamp, which if non-zero means the watch is running) plus calling into the stopwatch static methods is still simple and unambiguous.
But in cases where being low/no allocation is less critical, but your are still concerned about the impacts of the allocations, then these sort of optimizations certainly do help. Plus they even help when you don't really care about allocations, just raw perf, since the optimizations improve raw performance too.
But ASP.NET core is a great platform for any kind of backend application.
https://endjin.com/blog/2024/11/how-dotnet-9-boosted-ais-dot...
It makes me happy to see MS investing in C# like this. I love the notion of having competing VMed languages.
That said, while C# (and the dotnet runtime) are awesome, MS is doing it a disservice lately (poor tooling, Cursor/VSCode controversy etc. etc.) C# could've been so much bigger...
These posts are among the very best, digging into details explaining why things work and why changes were made.
Every time they get released I'm happy because no one killed it...
I’ve lost count of the number of times I’ve seen customers immediately “double down” on the size of their servers as a quick fix… and achieving nothing other than increasing their cloud provider’s revenue.
Performance comes from a long series of individually small fixes.
It’s just shocking how much faster vanilla Linux is compared to vanilla windows 11.
Edit: by vanilla Linux I mean out of the box installation of your typical distribution e.g. Ubuntu without any explicit optimisation or tuning for performance
Each distro, platform and desktop manager and related apps are relatively different, though all work pretty well on modern hardware. I'm currently running PopOS COSMIC alpha with the 6.16-4 kernel via mainline. It's been pretty good, though there have been many rough edges regarding keyboard navigation/support in the new apps.
You still need to make sure that everything works, but that's what tests are for, and this has to be checked regardless of your tech stack.
Of course, they had a massive backwards compat break when moving from the regular aspnet to aspnet core, here's hoping nothing like that happens in the next 10-15 years..
With enough experience you can accomplish pretty much everything using just the minimal API and router. HttpContext is the heart of AspNetCore. If you can get your hands on instances of it within the appropriate application context, you can do anything you need to. Everything else is dependent upon this. The chances that HttpContext can be screwed with are very, very low. There are billions of dollars riding on the fact that this type & API remains stable for the next decade+.
The DI pattern is simple & clean at this scale. In my top-level program I define my routes like:
app.Map("/", Home.HandleRequest);
app.Map("/login", Login.HandleRequest);
app.Map("/account/new", NewAccount.HandleRequest);
app.Map("/{owner}", OwnerHome.HandleRequest);
And then I have HandleRequest implementations like: static async Task HandleRequest(HttpContext context, SQLiteConnection sql)
static async Task HandleRequest(HttpContext context, SQLiteConnection sql, string owner)
etc...
The actual HandleRequest() method can do anything, including concerns like directly accepting and handling web socket connections.Java on the other hand had an implementation of generics that made Container<X> just a Container so you could mix your old containers with generic containers.
Now Java's approach used type erasure and had some limitations, but the C# incompatibility made me suffer every day, that's the cultural difference between Java and a lot of other languages.
It's funny because when I am coding Java and thinking just about Java I really enjoy the type system and rarely feel myself limited by type erasure and when I do I can unerase types easily by
- statically subclassing GenericType<X> to GenericType<ConcreteClass>
- dynamically by adding a type argument to the constructor
- mangling names (say you're writing out stubs to generate code to call a library, you can't use polymorphism to differentiate between
Expression<Result> someMethod(Expression<Integer> x)
and Expression<Result> someMethod(Expression<Double> x)
since after erasure the signature is the same so you just gotta grit your teeth and mangle the method names)but whenever I spend some time coding hard in a language that doesn't erase generic parameters I come back and I am not in my comfortable Java groove and it hurts.
If you're up for it you should give it another try. Your example of subclassing GenericType<X> and GenericType<ConcreteClass> may be supported with covariance and contravariance in generics [1]. It's probably not very well known among C# developers (vs. basic generics) but it can make some use cases a lot easier.
[1] https://learn.microsoft.com/en-us/dotnet/standard/generics/c...
People talk about tradeoffs with GC, the worst one is that I've seen an occasional game that has a terrible GC pause, for instance Dome Keeper based on Godot which also runs in .NET. I used play a lot of PhyreEngine (also .NET) games on the Playstation Vita and never noticed GC pauses but I think those games did a gc on every frame instead of letting the garbage pile up.
Not all GCs are created equally either. Unity, for example, is based on an ancient version of Mono and so it uses the Boehm GC which is significantly slower than the one used by .NET. Godot probably has two GCs because it primarily runs GDScript (their custom language) and only supports using .NET in a separate engine build. They'll all have their own performance characteristics that the developer will need to adjust for.
Correct me if I'm wrong but allowing this would mean the called api might insert objects wholly unrelated to X. This would break every assumption you make about the container's contents. Why would this ever be allowed or wanted?
Microsoft created the ".NET Standard" for this. Literally anything that targets .NET Standard 1.0 should work from circa 2001 through modern day 2025. You still get the (perf) benefits up the runtime upgrade which is what the blog post is about.
What are you looking for out of .NET? The staple packages don't go away as often as in languages like NodeJS
Scott Hanselman has a very short blog on how 20 year old code is upgraded to the latest .NET in just a few short minutes: https://www.hanselman.com/blog/upgrading-a-20-year-old-unive...
This seems unlikely. Do you have a source?