Rust inadequate for text compression codecs?(palaiologos.rocks)

89 pointsby todsacerdoti8 months ago19 comments

jclulow8 months ago
> But such behaviour is still unacceptable from a library perspective: a library should never, ever call abort or otherwise terminate the program.
It's true that libraries should not abort for a regular, foreseeable error. I fundamentally disagree that they should never abort.
If an invariant (something we believe absolutely must be true) is violated, the only sensible thing to do is abort the process. For example, if you set a pointer to NULL and then half an hour later it's time to store a value in that pointer but somehow it's not NULL anymore: clearly something has gone terribly wrong, either in your logic, or potentially in totally unrelated code in some other library which has scribbled on the heap. If execution is allowed to continue, the resultant behaviour is totally undefined: you might corrupt data, or you might allow a permissions check to pass that should have failed, etc. We can't always detect when something has gone wrong, but when we do, it's irresponsible to drive on or try to return an error.
- imtringued8 months ago
  The cultural differences between C developers and non-C developers never ceases to amaze me.
  The definition of unacceptable behaviour is so different. A program exit on an exploitable vulnerability is considered unacceptable. The program must continue running, even though all hope should have been lost by this point!
  Meanwhile on the other side of the ocean, it would be unacceptable for a program to enable the complete take over of a system!
  Getting the panics out of a rust codebase should much simpler than fuzzing out UB. After a few iterations, there won't be any panics whatsoever.
  Honestly what I'm seeing here is essentially that C is being preferred, because it lets you sweep the problem under the rug, because UB is such a diffuse concept with unusual consequences. Whereas a panic is concrete and requires immediate attention, attention that will make the program better in the long run.
  - kazinator8 months ago
    Many (most?) C developers are also non-C developers.
- theamk8 months ago
  This logic only works for memory-unsafe languages like C or C++, where the checks are rare and there is a good chance that by the time abnormal condition is detected, things have been going wrong for a while.
  But that's not true for safe languages - in Rust, if you set a pointer to not-null, it won't end up as null (unless there is buggy unsafe, but let's ignore that). Instead, the panics are likely to be caused by logic errors. Take the decompressor buffer overflow errors author mentioned - the out-of-bounds writes were caused by a bug bit operations generating wrong array index. In Rust, this would be caught by bounds checker, which is good; but Rust would then abort process, which is bad. A hypothetical language that would throw an exception instead of panic'ing would be much better from library perspective - for example a web server might return 500 on that particular request, but it would stay running otherwise.
  - int_19h8 months ago
    Panics can be caught and handled, though: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html. So it is exactly the right choice in the circumstances. Apps like webservers and other things that need to not terminate the process are expected to be compiled with panic=unwind. Everybody else should compile with panic=abort and sleep safe knowing that if their invariants get corrupted, execution will not continue.
    Using Result<T, E> - the Rust's rough equivalent to checked exceptions in something like Java - would be the wrong choice here since it effectively forces the client to check and handle invariant violations inside library code - and the client really has no way to handle them other then do the equivalent of 500 Internal Error, so there's no point doing such checks on every call.
  - chrismorgan8 months ago
    Just one problem with your argument: we’re not talking about languages that have orderly null pointer exceptions. We’re expressly talking about languages like C, C++ and Rust.
    Your web server example is uncompelling, because a panic-based abort is not the only thing that can distress your system. The simplest example is if the library code doesn’t terminate, accidentally triggering an infinite loop. Or (better from some perspectives, worse from others) an infinite loop that allocates until you run out of memory, denying service until maybe an out-of-memory killer kills the process. In such scenarios, your system can easily end up in a state you didn’t write code expecting, where everything is just broken in mysterious ways.
    No, if you want your web server to be able to return an orderly 500 in as many situations of unforeseen errors as possible, the plain truth is that you need the code that will produce that to run on a different computer (approximate definition, though with some designs it may not quite need to be separate hardware), and that you’ll need various sorts of supervisors to catch deviant states and (attempt to) restore order.
    In short: for such an example, you already want tools that can abort a runaway process, so it’s actually not so abnormal for a library to be able to abort itself.
    There’s genuinely a lot to be said for deliberately aborting the entire process early, and handling problems from outside the process. It’s not without its drawbacks, but it is compelling for things like web servers.
    I would also note that, if you choose to, you can normally catch panics <https://doc.rust-lang.org/std/panic/fn.catch_unwind.html>, and Rust web servers tend to do this.
    whytevuhuni8 months ago
    > Your web server example is uncompelling, because a panic-based abort is not the only thing that can distress your system.
    You seem to be saying that it wouldn't catch 100% of the problems, so catching only 80% is not that useful.
    I see that as uncompelling. 80% helps a lot!
    > you need the code that will produce that to run on a different computer
    Problem is, we're using C, C++ and Rust, because latency and performance matters. Otherwise we'd be using Go or Java.
    So in order to do what you're proposing, we'd have to do an outbound call on every link of a large filter/processing chain, serializing, transferring, and parsing the whole request data at each recoverable step.
    chrismorgan8 months ago
    Panics are already corner case territory.
    What I’m describing about producing 500s from a different machine is standard practice at scale, part of load balancers. And at small scale, it’s still pretty standard practice to do that from at least a different process, part of reverse proxying.
  - 8 months ago
    undefined
- vacuity8 months ago
  I disagree. Just as C++ and Rust conceive of "plain old data" (POD) as data that basically has no surprising behaviors on cleanup (probably butchered that explanation), I think of libraries as "plain old code". Unlike services, libraries are part of the address space and should passively serve the main executable portion. That's why they are libraries specifically. A library should, if it detects an error, tell the main executable and then it is up to the main executable to do the responsible thing and address the error seriously. Needless to say, I don't like glibc-style "libraries". Doing too much in the background, in the same address space, no less. A library should be a nice, isolated unit that a programmer chooses to add into their executable at will. My concern is that composability and modularity aren't respected enough.
  - GuB-428 months ago
    I consider a panic to be equivalent to a segfault. It is a bug, either in the library itself, or unproper usage, and the system just stopped you from doing further damage. The only reasonable thing to do in this case is to fix the bug.
    Imagine you are catching the exception. What are you going to do next? From now on, anything can break, and how do you hope recovering if you couldn't do it right in the first place (that's what caused the panic)? You can have a panic handler, in the same way that you can trap SIGSEGV, for some debugging, but that's about it. If a crash is really problematic, use some process wrapper, and maybe a watchdog. Libraries don't work like that for performance and flexibility reasons, they share address space, but the downside is that if they crash, they take everything else with it.
    vacuity8 months ago
    Indeed. It's in the name: libraries are there to be browsed, while services wait for customers to call on them (and do management and other things in the meantime). This should be understood among the developers of applications, libraries, and services, that libaries are functional add-ons while services are separate agents. A library has its expertise and should isolate bubble up application errors (to be fixed by the application) and environmental errors (can't really be fixed by anyone, but the application should decide how to proceed). If a library has internal errors, how is the library itself or the application supposed to fix it? It's like position-independent code: if its basic assumptions are met, it should be plug-in-play anywhere without complaint or knowledge.
  - int_19h8 months ago
    That is exactly what panics do in Rust. The difference between Result<T,E> and a panic is basically two-fold:
    1. The owner of the process (whoever is compiling the binary) - not the library! - gets to decide whether panics immediately abort or try to unwind in a way that can be handled.
    2. A panic is never expected (i.e. it always indicates a bug in the code rather than invalid input or other expected failure conditions), so it's optimized for the success scenario both in terms of syntax and in terms of runtime cost. In practice, it means that syntactically the panic always auto-propagates (whereas you need `try` etc with Result); and the code generated by compiler is zero-cost wrt panics if one never happens, but the unwinding is very slow if a panic does happen.
    vacuity8 months ago
    While panics can be made to unwind and be caught, at that point the different error handling methods are really abort/Result bubbling/Result bubbling but cooler, and I want the last one to be packaged up nicer than "unwinding panic" is today. Stack unwinding should be orthogonal. And I think Result bubbling the boilerplate way is good unless performance is paramount. Keeps the happy and error paths abundantly clear to the developer. I don't know the details of C++'s zero-cost exceptions, but Rust's certainly are not.
    int_19h8 months ago
    Can you clarify what you mean by "packaged up nicer"? Given that this is a feature that should be used very sparingly, and also one that is very apt to be misused (as experience with exceptions in other languages demonstrates), I would argue that lack of syntactic sugar for it is a good thing. But I'm willing to consider arguments to the contrary, yet the existing API seems broadly fine to me? How would you change it?
    As far as cost, it depends on the arch and its ABI, but on x64 they use something called "unwind tables", which is basically a structure that lists all cleanup code that needs to be run for unwinding given a range of addresses inside a function. Such tables can be produced entirely at compile time, and they only need to be checked during unwinding (i.e. if there's a panic), so on the success path you pay no perf penalty. They are not entirely free in that they do make your binary larger, but speed shouldn't be affected.
    vacuity8 months ago
    > Given that this is a feature that should be used very sparingly, and also one that is very apt to be misused
    While it should be used carefully, perhaps it need not be used sparingly. My main concern is that unwinding panic occupies a weird role of being both a way to crash and a catchable exception, and I think the former should be distinct and the latter should be integrated more nicely with normal Result bubbling, essentially doing the same thing but focusing on performance and readability on the happy path.
    > As far as cost, it depends on the arch and its ABI
    In Rust, panic branches are ubiquitous and the compiler's optimizations are hindered by the fact that mostly anything might panic. If there was an easy way to indicate that in this specific instantiation, integer overflow definitely won't happen, a panic could be avoided. In order to avoid unsafe, I imagine it would be something like contracts or mini theorem provers, though, which is only helpful if they're already being used.
  - throwaway17_177 months ago
    I'm absolutely sure that this is the clearest statement of how I view libraries I've ever read. I am not actually certain I would have phrased it this well. So I'm commenting to save it in my comments thread.
- pwdisswordfishz8 months ago
  > the only sensible thing to do is abort the process
  Abort whatever depends on the invariant, which may be less than the whole process.
  - int_19h8 months ago
    Yeah, but you need to ensure that you have an actual isolation boundary in place between systems with different invariants to ensure that there's no shared state (since any shared state implies shared invariants). Which is exactly what processes are. Not necessarily OS processes, though - e.g. consider Erlang, where the notion of an ultra-lightweight process provided by the runtime instead is exactly what makes this kind of error handling a natural fit.
    Now for some languages it is possible to determine that the state strictly through code analysis, without any runtime boundary enforcement. And I think that safe Rust might be in that category, but unsafe Rust definitely isn't - and whether any given library contains unsafe code is an implementation detail...
- lelanthran8 months ago
  > If an invariant (something we believe absolutely must be true) is violated, the only sensible thing to do is abort the process. For example, if you set a pointer to NULL and then half an hour later it's time to store a value in that pointer but somehow it's not NULL anymore: clearly something has gone terribly wrong, either in your logic, or potentially in totally unrelated code in some other library which has scribbled on the heap.
  Then you return an error and let the caller deal with it.
  There is no justification, ever, for a library aborting the caller. Even in the scenario you present, it's still better to let the caller deal with the fallout.
  > If execution is allowed to continue, the resultant behaviour is totally undefined: you might corrupt data, or you might allow a permissions check to pass that should have failed, etc.
  So? that corrupted data or failed permissions check already happened before the library gets to abort anyway.
  Let the caller do whatever they can before the process aborts; don't abort for them before the caller has more context than the library does. If you abort without returning an error to the caller, the caller cannot do things like log "Hey, that previous permission check we allowed might have been allowed by accident".
  Your way provides absolutely no upside at all.
  - burntsushi8 months ago
    > Then you return an error and let the caller deal with it.
    No, this is absurd. C libraries generally don't do this either. Instead, they might just have UB instead where as Rust tends to panic. So in practice, your suggestion ends up preferring vulnerabilities as a result of UB versus a DoS.
    See: https://news.ycombinator.com/item?id=43300120
    Can you show me C libraries you've written that are used by others that follow your strategy of turning broken runtime invariants into error values?
    lelanthran8 months ago
    > No, this is absurd.
    Just to be clear, you are making the argument that when a library call detects an error of the form of unexpected NULL/not-NULL, that they abort immediately?
    To be even more clear, I'm not making the argument that the program should proceed as normal after detecting an error (regardless of the type of error).
    That is not the argument that I am making which is why I find your "That's absurd" condescension extremely confusing.
    int_19h8 months ago
    This is exactly what assert() does, so literally any C library that contains an assert() can produce a nonrecoverable error.
    GP's point is that most libraries will not even assert(), but instead just assume the invariant holds and proceed accordingly, resulting in UB. And it is, of course, infeasible for a library to constantly test for invariants holding at every single point in the program. So in practice you have to assume that a library breaking its internal invariants is going to UB. If library dev added some asserts to this, they are doing you a favor by making sure that, at least for some particular subset of broken invariants, that UB is guaranteed to be a clean abort rather than running some code doing god knows what.
    burntsushi8 months ago
    My position is articulated here, with real examples: https://burntsushi.net/unwrap/
    > Just to be clear, you are making the argument that when a library call detects an error of the form of unexpected NULL/not-NULL, that they abort immediately?
    There's no blanket answer here because your scenario isn't specific enough. Is the pointer caller provided? Is the pointer entirely an internal detail whose invariant is managed internally? Is this pointer access important for perf? Is the pointer invariant encapsulated by something else?
    Instead, I suggest showing examples of C libraries following your philosophy. Then we'll have something concrete.
    In the comment I linked, you'll noticed that I actually looked at and reviewed real code examples. Maybe you could engage with those.
    vacuity8 months ago
    I happen to agree with @lelanthran's position, but aside from that I think their point is not that there are C libraries following their principle, but rather that libraries should follow it. This is akin to Rust's "get"/"try" style of fallible methods, avoiding both UB and exceptions/panics. It also seems moot to ask this of typical C libraries, as they wouldn't use exceptions either.
    (I've edited this multiple times by now, apologies if it's confusing. Only adding things to it, but it may read weirdly as I've reconsidered what I'm trying to say.)
    For something like array indexing in Rust, it's not bad to have a panicking operator by default because it's very upfront and largely desired. Similarly, a library may document when it panics just as it would document its error type if it returned error values. But something that I would consider very bad design is if I use a library that spawns another thread and does some file processing and can panic, without making this clear to me.
    I think one of your main points is, suppose a library theoretically could index an array OOB and panic; it is not formally verified not to and so the developer is just covering all bases conveniently. The normal alternative being UB is of course unacceptable. There is a crucial distinction to be made here. If the index is derived from the application, return an error value making this clear to the application. However, at some point the index may be considered only internally relevant. I agree this is fine. The thought is that this will never trigger and the application will be none the wiser. If it is ever triggered, the library should be patched quickly. I think is not all the panics that people in this thread have in mind, as otherwise panics should be seen basically never, just as a well-designed but otherwise normal C program would have risk of UB but should exhibit this basically never. There should be an effort to minimize panics to the ones that are just sanity checks and only there for completeness, rather than a convenient way to handle failure.
    With panics, either I just let them happen when they will or I have to defensively corral the library into working for me. With error values, the library has set out to state its terms and conditions and I can be happy that the burden is on me to use it properly. I have more control over the application's behavior from the start, and the extra work to surface errors to users properly is more or less equal between both approaches. Yes, panics can also be laid out in the API contract. But it's more enforceable with error values.
    If there was a good way to do error-values-as-exceptions (automating Result bubbling with ?) that just panics up until a good boundary and returns a Result, that's basically catch_unwind but cleaner. It's true that oftentimes aborting (perhaps after cleanup) is the best way to handle errors, but it shouldn't be a struggle to avoid that when I know better. Particularly with C's malloc(): maybe I do want to change my program behavior upon failure instead of stopping right then and there.
    vacuity8 months ago
    I seem to be rambling. I will add, particularly to clarify my third paragraph (the big one): a library should not panic. It can have defensive panics, but overall it should not panic, and triggering a defensive panic is to be treated as a bug. The exception is the panics that the application developer can reasonably be considered to have agreed to, and presumably the library should take care to make those panics as easy to handle as possible.
    burntsushi8 months ago
    I think the knot you are trying to untangle here is what I untied here: https://burntsushi.net/unwrap/
    The issue being addressed in this thread is that OP says this:
    > While C++’s std::vector<T>::at() throws an exception which can then be caught and cleanly relayed to the application, a panic!() or an abort() are much more annoying to catch and handle. Moreover, panic!()’s are hiding even in the most innocious places like unwrap() and expect() calls, which in my perception should only be allowed in unsafe code, as they introduce a surface for a denial-of-service attack.
    This is not a nuanced position separating internal runtime invariants with preconditions and what not, like what you're doing and like what my blog does. This is a blanket statement about the use of `unwrap()` itself, and presumably, all panicking branches.
    This in turn led to this comment in this thread, to which I responded to as being terrible advice:
    > That behavior is up to the user. The library should only report the error.
    This is an extreme position and it seems to be advocated by several people in this thread. Yet nobody can point to real examples of this philosophy. Someone did point out sqlite, libavcodec, lmdb and zlib as having error codes that suggest this philosophy is employed, but actually looking at the code in question makes it clear that for most internal runtime invariants, the C library just gets UB. In contrast, Rust will usually prefer panics for the same sorts of broken invariants (like out-of-bounds access).
    The bottom line here is that I perceive people are suggesting an inane philosophy to dealing with internal runtime invariants. And that instead of going back-and-forth in abstraction land trying to figure out what the fuck other people are talking about, I think it's far more efficient for people to provide concrete examples of real code used by real people that are following that philosophy.
    If the philosophy being suggested has no real world examples and isn't even followed by the person suggesting it, then the certainty with which people seem to put this philosophy forward is completely unwarranted.
    Asking for real world examples is a shortcut to cutting through all this confusing language for trying to describe the circumstances that can lead to aborts, panics or UB. (Hence why my blog I linked earlier in this comment is so long.) It's my way of trying to get to the heart of the matter and show that these pithy comments are probably not suggesting what you think they're suggesting.
    I've written hundreds of thousands of lines of Rust over the years. Many of my libraries are used in production in a variety of places. All of those libraries use `unwrap()` and other panicking branches liberally. And this isn't just me. Other ecosystem libraries do the same thing, as does the standard library.
  - kazinator8 months ago
    A Unix kernel will kill your process just because it wrote into a broken pipe.
- anon-39888 months ago
  That behavior is up to the user. The library should only report the error.
  - burntsushi8 months ago
    This is terrible advice, and people suggesting it should put their money where their mouth it and show real world examples.
    Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?
    I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen."
    Nevermind the fact that you're now potentially introducing an error channel into operations that should never actually error. That potentially has performance implications.
    Nevermind the fact that now your implementation details (internal runtime invariants are implementation details) have now leaked out into your API. Need a new internal runtime invariant? Now you need to add that fact to the API. Need to remove an invariant? Ah well, you still need to leave that possible error value in your API to avoid breaking your users.
    haberman8 months ago
    Ok, but let's consider the other extreme: a world where no library can ever assume a runtime invariant holds without dynamically checking it first.
    In this world, Vec::index() would need to perform not only a bounds check but also a check that the pointer is not NonNull::dangling(). Sure, RawVec is supposed to guarantee that the pointer will not be dangling when cap is >0, but RawVec could have a bug in it.
    I agree that documenting and returning a PtrWasDanglingError error is not good API design. An InternalError for all such cases seems more reasonable. But at some point we need to be able to assume that certain program invariants hold without checking at all (in a release build).
    burntsushi8 months ago
    We don't have to live in the extreme though. That's one of the great advantages of Rust. :-)
    In `regex`, for example, there are certainly some cases where I use `unsafe` to elide those dynamic checks because 1) I couldn't do it in safe code and 2) I got a performance bump from it. But of all the dynamic checks in `regex`, this was an extremely small subset of them.
    And it makes sense to rely on abstractions like `RawVec` to uphold those guarantees.
    The point is that you're making that trade-off intentionally and for a specific reason (perf). The idea that I would support dogmatically always checking every runtime invariant everywhere is bonkers. :P In contrast, we have someone here who I responded to that is literally suggesting propagating every possible broken runtime invariant into a public API error value.
    haberman8 months ago
    I am biased towards thinking that low-level libraries should generally avoid panic. That would mean that all invariants are either assumed true or returned as errors to the user.
    I think this is not an unreasonable design: it's how low-level C libraries are traditionally designed. For example SQLite does what I mentioned and has a single SQLITE_INTERNAL error that is documented as:
    > The SQLITE_INTERNAL result code indicates an internal malfunction. In a working version of SQLite, an application should never see this result code. If application does encounter this result code, it shows that there is a bug in the database engine. --https://www.sqlite.org/rescode.html#internal
    I didn't mean to imply that you are for dogmatic checking of every runtime invariant, but the message that began that thread seems to advocate for that, going so far as to try to detect other buggy code that might have stomped on your memory.
    burntsushi8 months ago
    SQLite is really a terrible example of anything other than what you can accomplish when you pour enormous resources into a single C library. Its `SQLITE_INTERNAL` error code is atypical in my experience. My recollection is that its tests are an order of magnitude bigger than SQLite itself. It is nowhere near a typical example.
    I don't think `SQLITE_INTERNAL` is how C libraries are typically designed, and even when they are, that doesn't mean they aren't risking UB in places. PCRE2 has its own `PCRE2_ERROR_INTERNAL` error value too, but it's had its fair share of UB related bugs because C is unsafe-everywhere-by-default.
    More to the point, the fact that hitting UB-instead-of-abort-or-unwinding is normal C library design is kinda the point: that's almost certainly a good chunk of why you end up with CVEs worse than DoS. How many vulnerabilities would have been significantly limited if C made you opt into explicit bound check elision?
    > but the message that began that thread seems to advocate for that
    I agree it is poorly worded. I should have caught that in my initial comment in this thread.
    The problem here really is the extremes IMO. The extremes are "libraries should never use `unwrap()`" and "libraries should check every runtime invariant at all points and panic when they break." You've gotta use your good judgment to pick and choose when they're appropriate.
    But I have oodles of `unwrap()` in my Rust libraries. Including in the regex crate's parser. And for sure, some people have hit bugs that manifest as panics. And those could in turn feasibly be DoS problems. But they definitely weren't RCEs, and that's because I used `unwrap()`.
    haberman8 months ago
    > Its `SQLITE_INTERNAL` error code is atypical in my experience.
    In my experience it's reasonably common. Here are some other examples in what I would consider quintessential, high-quality C libraries:
    - zlib has Z_STREAM_ERROR, which is documented in several places as being returned "if the stream structure was inconsistent"
    - libavcodec has AVERROR_BUG, documented as "Internal bug, also see AVERROR_BUG2".
    - LMDB has MDB_PANIC, documented as "Update of meta page failed or environment had fatal error".
    > And for sure, some people have hit bugs that manifest as panics. And those could in turn feasibly be DoS problems. But they definitely weren't RCEs, and that's because I used `unwrap()`.
    I feel this is conflating two things: (1) whether or not an invariant should get a dynamic check, and (2) when a dynamic check is present, how the failure should be reported.
    Rust brings safety by forcing (safe) code to use dynamic checks when a safety property cannot be statically guaranteed, which addresses (1). But there's still a degree of freedom for whether failures are reported as panics or as recoverable errors to the caller.
    I wrote down some of my thinking in this recent blog entry, which actually quotes your excellent summary of when panics are appropriate: https://blog.reverberate.org/2025/02/03/no-panic-rust.html
    (ps: I'm a daily rg user and fan of your work!)
    burntsushi8 months ago
    I'd have to look more closely at those examples, but I find it hard to believe that every runtime invariant violation manifests as one of those error codes. It certainly isn't true for PCRE2.
    > Rust brings safety by forcing (safe) code to use dynamic checks when a safety property cannot be statically guaranteed, which addresses (1). But there's still a degree of freedom for whether failures are reported as panics or as recoverable errors to the caller.
    Sure, you can propagate an error. I just don't really see a compelling reason to do so. Like, maybe there are niche scenarios where maybe it's worthwhile, but I do not see how it would be compelling to suggest it as general practice.
    You might point to C libraries doing the same, but I'd have to investigate what exactly those error codes are actually being used for and _why_ the C library maintainers added them. And the trade-offs in C land are totally different than in Rust. Those error codes might not exist if they had a panicking mechanism available to them.
    > I wrote down some of my thinking in this recent blog entry, which actually quotes your excellent summary of when panics are appropriate: https://blog.reverberate.org/2025/02/03/no-panic-rust.html
    Yes, I've read that. It's a nice blog, but I don't think it's broadly applicable. Like, I don't see why I would write no-panic-Rust outside of extremely niche scenarios. My blog on unwraps is meant to be more broadly applicable: https://burntsushi.net/unwrap/ (It even covers this case of trying to turn runtime invariant violations into error codes.)
    hyc_symas7 months ago
    > LMDB has MDB_PANIC, documented as "Update of meta page failed or environment had fatal error".
    Yes. That doesn't mean there was anything bad in the program logic. It most likely means your storage device had a fatal I/O error. It means there's something physically wrong with your system. Not that there was any bug in any code.
    burntsushi8 months ago
    Now that I've slept, I decided to take a look at LMDB. It uses MDB_PANIC in exactly two places:
    https://github.com/LMDB/lmdb/blob/f20e41de09d97e4461946b7e26...
    https://github.com/LMDB/lmdb/blob/f20e41de09d97e4461946b7e26...
    I would say this overall does not even come close to qualifying as an example of a library that "returns errors for invariant violations instead of committing UB."
    You don't have to look far to see something that would normally be a panicking branch in Rust be a UB branch in C: https://github.com/LMDB/lmdb/blob/f20e41de09d97e4461946b7e26...
    if (err >= MDB_KEYEXIST && err <= MDB_LAST_ERRCODE) { i = err - MDB_KEYEXIST; return mdb_errstr[i]; }
    That `mdb_errstr[i]` will have UB if `i` is out of bounds. And `i` could be out of bounds if this code gets out of sync with the defined error constants and `mdb_errstr`. Moreover, it seems quite unlikely that this particular part of the code benefits perf-wise from omitting bounds checks. In other words, if this were Rust code and someone used `unsafe` to opt out of bounds checks here (assuming they weren't already elided automatically), that would be a gross error in judgment IMO.
    The kind of examples I'm asking for would be C libraries that catch these sorts of runtime invariants and propagate them up as errors.
    Instead, at least for LMDB, MDB_PANIC isn't really used for this purpose.
    Now looking at zlib, from what I can tell, Z_STREAM_ERROR is used to validate input arguments. It's not actually being used to detect runtime invariants. zlib is just like most any other C library as far as I can tell. There are UB branches everywhere. I'm sure some of those are important for perf, but I've spent 10 years working on optimizing low level libraries in Rust, and I can say for certain that the vast majority of them are not.
    libavcodec is more of the same. There are a ton of runtime invariants everywhere that are just UB if they are broken. Again, this is not an example of a library eagerly checking for invariant violations and percolating up errors. From what I can see, AVERROR_BUG is used at various boundaries to detect some kinds of inconsistencies in the data.
    IMO, your examples are a total misrepresentation of how C libraries typically work. From my review, my prior was totally confirmed: C libraries will happily do UB when runtime invariants are broken, where as Rust code tends to panic. Rust code will opt into the "UB when runtime invariants are broken," but it is far far more limited.
    And this further demonstrates why "unsafe by default" is so bad.
    haberman8 months ago
    I think this is moving the goalposts.
    My claim was not "these C libraries perfectly avoid UB by dynamically checking every invariant that could lead to UB if broken." Clearly they do not, as you have demonstrated. (Neither does unsafe Rust).
    My claim was that in cases where a (low-level, high quality) C library does check an invariant in a release build, it will generally report failure of that invariant as an explicit error code rather than by crashing the process.
    To falsify that, you would need to find places where these libraries call abort() or exit() in response to an internal inconsistency, in a release build. I think you are unlikely to find examples of that in these libraries. (After a bit of searching, I see that libavcodec has a few abort()s, but uses AVERROR_BUG an order of magnitude more often).
    I agree with you that Rust's "safe by default" is important. I am advocating that Rust can be a powerful tool to provide C-like behavior (no crash on inconsistency) with greater safety (checking all relevant inconsistencies by default). In cases where C-like behavior is desired, that's a really appealing proposition.
    Upthread it seemed like you were objecting to the idea of ever reporting internal inconsistencies as recoverable errors. You argued that creating and documenting error codes for this is not common or practical:
    > I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen."
    That is exactly what SQLITE_INTERNAL and AVERROR_BUG are.
    burntsushi8 months ago
    > My claim was that in cases where a (low-level, high quality) C library does check an invariant in a release build, it will generally report failure of that invariant as an explicit error code rather than by crashing the process.
    That just seems very uninteresting though? And it kinda misses the whole point of where this conversation started. It's true that Rust code is going to check more things because of `unwrap()`, but that's a good thing! Because the alternative is clearly what C libraries practice: they'll just have UB. So you give up the possibility of an RCE for the possibility of a DoS. Sounds like a good trade to me.
    >> I'd love to see the API docs for them. "This error value is impossible and this library will never return it. If it does, then there is a bug in the library. Since there are no known bugs related to this invariant violation, this cannot happen." > > That is exactly what SQLITE_INTERNAL and AVERROR_BUG are.
    I meant that it should reflect the philosophy of handling broken runtime invariants generally in the library. Just because there's one error code for some restricted subset of cases doesn't mean that's how they deal with broken runtime invariants. In all of your examples so far, the vast majority of broken runtime variants from what I can see lead to UB, not error codes.
    This is what I meant because this is what makes Rust and its panicking materially different from C. And it's relevant especially in contexts where people say, "well just return an error instead of panicking." But C libraries generally don't do that either! They don't even bother checking most runtime invariants anyway, even when it doesn't matter for perf.
    This is a big knot to untangle and I'm sure my wording could have been more precise. This is why I wanted to focus on examples, because we can look at real world things. And from my perspective, the examples you've given do not embody the original advice that I was replying to:
    > That behavior is up to the user. The library should only report the error.
    Instead, while there is limited support for "this error is a bug," the C libraries you've linked overwhelming prefer UB. That's the relevant point of comparison. I'm not interested in trying to find C libraries that abort. I'm interested in a holistic comparison of actual practice and using that to contextualize the blanket suggestions given in this thread.
    haberman8 months ago
    > It's true that Rust code is going to check more things because of `unwrap()`, but that's a good thing! Because the alternative is clearly what C libraries practice: they'll just have UB.
    I have been consistently advocating for a third alternative that I happen to like more than either of these.
    My alternative is: write libraries in No-Panic Rust. That means we have all of the safety, but none of the crashes. It is consistent with the position articulated upthread:
    > That behavior is up to the user. The library should only report the error.
    No-Panic Rust means always using "?" instead of unwrap(). This doesn't give up any safety! It just reports errors in a different way. Unfortunately it does mean eschewing the standard library, which isn't generally programmed like this.
    I won't argue that every library should use this strategy. It is undoubtedly much more work. But in some cases, that extra work might be justified. Isn't it nice that this possibility exists?
    burntsushi8 months ago
    We're back to square one: show me some real Rust libraries in widespread use actually adhering to this philosophy. And then I want to see some applications built with this philosophy. Then we can look at what the actual user experience difference is when a bug occurs. In one case, you get a panic with a stack trace. In the other, you get an error value that the application does... what with? Prints it as an unactionable error to end users and aborts? If it continues on, does your library make any guarantees about the consistency of its internal state when a runtime invariant is broken?
    Panicking branches are everywhere in Rust. And even in your blog, you needed to use `unsafe` to avoid some of them. So I don't really get why you claim it is safer.
    Users of my libraries would 100% be super annoyed by this. Imagine if `Regex::find` returned a `Result` purely because a bug might happen.
    > But in some cases, that extra work might be justified. Isn't it nice that this possibility exists?
    What I said above:
    > Sure, you can propagate an error. I just don't really see a compelling reason to do so. Like, maybe there are niche scenarios where maybe it's worthwhile, but I do not see how it would be compelling to suggest it as general practice.
    Your blog is an interesting technical exercise, but you spend comparatively little time on whether doing it is actually worth the trouble. And there is effectively no space at all reserved to how this impacts library API design. To be fair, you do acknowledge this:
    > I should be clear that I have not yet attempted this technique at scale, so I cannot report on how well it works in practice. For now it is an exciting future direction for upb, and one that I hope will pay off.
    From your blog, you list 3 reasons to do this: binary size, unrecoverability and runtime overhead.
    I find that binary size is the only legitimate reason here, and for saving 300 KB, I would absolutely call that very niche. And especially so given that you can make panics abort to remove the code size overhead.
    I find unrecoverability unconvincing because we are talking about bugs here. Panics are just one very convenient manifestation of a bug. But lots of bugs are silent and just make the output incorrect in some way. I just don't see a problem at all with bugs, generally, causing an abort with a useful error message.
    I find runtime overhead very unconvincing because you can opt out of them on a case-by-case basis when perf demands it.
    We can go around the maypole all day on this. But I want to see real examples following your philosophy. Because then I can poke and prod at it and point to what I think you're missing. Is the `upd` port publicly available?
    whytevuhuni8 months ago
    I'd like to add another point:
    Both panics, and error-values for invariants, add a lot of branches in execution, for every invariant that is checked, and every indirect caller of functions that do it.
    This means basically all function calls introduce new control flow at the call site, because they may either panic, or return an error value that the programmer will almost always immediately bubble up.
    Such a large amount of new control flow is going to be impossible to reason about.
    But!
    Panics, and specifically catching them, as they are implemented in Rust, require that the wrapped code is UnwindSafe [1]. This is a trait that is automatically implemented for objects that remain in a good state despite panics. This automatically makes sure that if something unexpected does happen, whatever state was being modified, either remains in a mostly safe shape, or becomes unreadable and needs to be nuked and rebuilt.
    This is massively useful for things like webservers, because simply catching panics (or exhaustive error values) is not enough to recover from them. You need to be able to ensure that no state has been left permanently damaged by the panic, and Rust's implementation of catch_unwind requiring things to be UnwindSafe is a lot better than normal error values.
    [1]: https://doc.rust-lang.org/stable/std/panic/trait.UnwindSafe....
    haberman8 months ago
    I do not claim that No-Panic Rust is popular (or even used at all) in Rust libraries currently. If it was popular, I would not have had to think so hard about it and write a blog entry. I claim that this technique is widespread in C libraries, and I believe I have demonstrated that.
    Our conversation was sidetracked because you claimed that panic and unwrap() were essential parts of how Rust provides safety, and that the C precedent doesn't apply because C's approach is unsafe. But I claim that No-Panic Rust is potentially a solution that gives you the best of both worlds: comparable safety without risk of a (detected) bug crashing the entire process. So I do think that the C precedent applies.
    I grant that there are applications where panics are a perfectly reasonable way of handling internal errors. Your ripgrep is a perfect example: it's a short-lived process that only does one thing, and users are running it from a terminal (and are probably tech savvy) so they can easily copy and paste the crash into a bug report.
    But there are lots of other applications that are not like this. Consider the Linux kernel, where a panic takes down your entire computer. Or consider a mobile (iOS or Android) application where there is no terminal to dump to, and the user experience of a crash is that the app closes unexpectedly and without explanation. Or consider a web browser where it would be very annoying for an entire tab or browser to crash just because one operation (like using the search box) ran into an internal error.
    In most of these cases, you want to let the program continue if reasonably possible after an error is encountered, while also logging the error for later inspection/diagnosis and possibly telemetry. Probably you will be abandoning any internal state associated with the failing operation.
    It's true that my blog uses unsafe in two cases to get rid of panics. The first is to call libc::printf(), but this is only required because the Rust stdlib does not offer any No-Panic API for printing to stdout. This is really just a symptom of the fact that No-Panic programming has little precedent in Rust. If there was a No-Panic variant of the standard library, it could offer a safe API for printing to stdout.
    The second case is an optimization, where we are trying to remove a bounds check for performance reasons. This is an example of "opting out on a case-by-case basis", except what I am proposing is more principled and arguably safer than merely switching to get_unchecked(). I am asserting the underlying invariant of the data structure, and then letting the optimizer infer that the invariant implies that the bounds check is not necessary. I think this is pretty interesting, and very cool that the compiler is able to do this.
    So overall I do argue that No-Panic Rust offers comparable safety to panics and unwrap().
    The Rust port of upb is on the back burner currently, and nothing is open-sourced yet.
    burntsushi8 months ago
    > I do not claim that No-Panic Rust is popular (or even used at all) in Rust libraries currently.
    I didn't say you did! Goodness this conversation is super frustrating. I'm not trying to get you to legitimize your opinions by pointing to popularity, but I just want to see some examples of your philosophy actually working in real world scenarios.
    > I claim that this technique is widespread in C libraries, and I believe I have demonstrated that.
    I have yet to see any such evidence. The C libraries you've shown me have a litany of UB branches where Rust would have panicking branches. None of the C libraries you've linked are coded in the style demonstrated in your blog. If they were, there would be a whole lot more invariant checking (like bounds checks) leading to error codes for those invariant violations.
    Instead, the C code primarily just lets UB take over for internal invariant violations. Which may indeed wind up in an abort. Or someone stealing your credit card numbers. ¯\_(ツ)_/¯ That's not at all the style you advocate for in your blog.
    The C libraries you link do have some error codes for something resembling internal invariant violations, but from my review, this is not practiced generally and is far more limited than the style you advocate for in your blog.
    > Your ripgrep is a perfect example
    I specifically did not cite ripgrep as an example. I cited my libraries. I might be best known for my work on ripgrep, but the vast majority of Rust work I've done over the last decade is in libraries. And those are used in all sorts of places.
    Moreover, it isn't just my libraries that use this philosophy. It's pretty much all of them, including std.
    > Consider the Linux kernel, where a panic takes down your entire computer.
    The Linux kernel is one of the few places where I've seen someone argue compellingly for "prefer UB on invariant violations generally, and not panicking." I don't agree with them, but I don't have any practical experience in that specific domain to refute them. Indeed, I view the practice quite skeptically, given that I'd greatly prefer my computer to shut down than, to, say, corrupt my data on disk.
    > Or consider a mobile (iOS or Android) application where there is no terminal to dump to, and the user experience of a crash is that the app closes unexpectedly and without explanation. Or consider a web browser where it would be very annoying for an entire tab or browser to crash just because one operation (like using the search box) ran into an internal error. > > In most of these cases, you want to let the program continue if reasonably possible after an error is encountered, while also logging the error for later inspection/diagnosis and possibly telemetry. Probably you will be abandoning any internal state associated with the failing operation.
    Your suggestion here amounts to asking Rust libraries to guarantee reasonable and consistent behavior when internal runtime invariants have been broken. That's the only way, "return an error for a broken invariant and otherwise continue on" actually works. I don't see how that's tractable and this is why I ask for examples.
    There is nothing you can say that's going to convince me. I have to be shown. Because the fundamental component of my skepticism is seeing the practice in the real world and the kinds of effects it has that are not captured by either your analysis or mine. Indeed, my years of experience building fundamental ecosystem libraries in Rust tells me that your approach does not scale. At all.
    I, several comments ago, carefully conceded that the style of Rust you advocate may be useful in niche scenarios. So my position is not, "your philosophy is never useful and it should never be used." My position is, "it is not good idea generally, and it does not match the prevailing convention of C libraries."
    I think this conversation has probably run its course. Sincerely, I would like to see examples of your practice more broadly. I want to see how it works and what the real and actual trade-offs are.
    erk__8 months ago
    If zstd give you an error and you don't handle it, the next calls may cause UB, so it kinda does both things.
    https://github.com/facebook/zstd/blob/b16d193512d3ded82fd584...
    chambers8 months ago
    > SQLite is really a terrible example of anything other than what you can accomplish when you pour enormous resources into a single C library.
    That's quite a sweeping, even caustic, indictment.
    Can you explain this statement more?
    burntsushi8 months ago
    Another way of putting it is that SQLite is uniquely amazing.
    But this makes it very atypical: https://www.sqlite.org/testing.html
    So it is hard to use as an example of typical practice.
    int_19h8 months ago
    > That would mean that all invariants are either assumed true
    What, exactly, is the benefit of assuming the invariant holds without checking it, over checking and aborting if it's not true? In the first case, you're likely to segfault anyway, just at some later point, making it harder to locate the point at which invariant was actually broken - and that's the best case. Worst case, you'll silently compute and return the wrong result based on garbage data.
    anon-39888 months ago
    > Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?
    Its only "hard" because languages make it hard and implicit what the std library itself is doing.
    At worst, every single "assert" should be an exception.
    Not ALL possible invariant but wrong array index is a lot of them? If Rust's Vec::operator[] or allocation (new, default) all returns an Result<T, E>. Then its a matter of adding another invariant to the libraries' error enum.
    I don't have anything public really to show this but at my work, our library written in C++ have like gazillions of checks using macros, every single invariant is checked and they all crash at debug and throw exceptions at release.
    The idea is to eventually trim them down after its been stabilized? Just kidding! The reality is that it actually caught bugs introduced by changes multiple times so it will probably be there forever.
    burntsushi8 months ago
    Neither C nor Rust have exceptions. Rust has unwinding, which are kind of like exceptions. And indeed `assert!` will do unwinding (although applications can be built in a way where it will be an abort) that can be caught.
    > Its only "hard" because languages make it hard and implicit what the std library itself is doing.
    This is an extraordinary claim that requires extraordinary evidence. I absolutely don't buy this for one second.
    > I don't have anything public really to show this but at my work, our library written in C++ have like gazillions of checks using macros, every single invariant is checked and they all crash at debug and throw exceptions at release.
    This isn't good enough to demonstrate that your advice is broadly applicable. Nowhere near good enough. Other people in this thread provided examples of what they thought were the same thing, but it actually turned out that most broken internal runtime invariants would just lead to UB, despite having error codes like "this only gets returned if there is a bug." So I can't tell if you're making the same mischaracterization.
    Moreover, you don't state the domain you're in. I could absolutely believe that there are some domains where it is acceptable to invest huge resources into eliminating all possible aborts, even for broken internal runtime invariants. But I'd expect them also to eliminate all possible instances of UB as well, because, well, UB can result in aborts! (And often does, via a segmentation fault.) At this point, you're in "prove your code is correct" territory. Sometimes that's warranted, but your comment was made without any of this nuance at all.
    hansvm8 months ago
    > Which libraries in widespread use know how to detect all of their possible bugs due to invariant violations and report them as explicit error values?
    We're talking about the cases that are already being caught somehow (bounds checks, unwraps, ...). It isn't necessary to detect all possible invariant violations to do something else instead of panic, and it suffices to have the language represent those failures without aborting the program.
    burntsushi8 months ago
    Show me a widely used C library that does even remotely the same thing. I promise you most places where Rust would use unwrap are just straight UB in C.
    I note that you provided no real world examples despite my request for them. Where's your code that is following this advice of yours?
    hansvm8 months ago
    I'm not advocating for C or against Rust though. I'm saying that GP's request to report errors instead of crashing is a perfectly fine opinion, and using Rust as an example of a language which already traps most instances of C UB, there aren't any fundamental reasons why Rust (or a fork or a similar language) couldn't use a different mechanism to signal failure states. Your request for code is irrelevant to my point.
    burntsushi8 months ago
    You advocate a particular coding style and I ask for real world examples demonstrating your advocacy in the real world. That's absolutely relevant!
    In contrast, the style I advocate has dozens of examples at your fingertips running in production right now. Including the Rust standard library itself. The Rust standard library happily uses `unwrap()` all over the place and specifically does not propagate errors that are purely the result of bugs coming from broken internal runtime invariants.
    theamk8 months ago
    That's how exceptions work, and they come pretty handy in a lot of circumstances. In such languages, any operation might throw RuntimeException (or equivalent) and the caller must be ready to handle that - or not, in which case it behaves exactly like non-exception-supported language.
    I know that a lot of people hate that idea, but I strongly disagree. In any large programs, there are thousands of possible errors, and only a small part of them we actually want to handle in a special way. The rest? They go to "other" category. Being able to handle "other" errors, what Rust calls "panic", significantly improves user experience:
    For CLI, print explanation that this is an unexpected failure, mention where the logs were saved, mention where to get support (forum/issue/etc...), and exit.
    For cron-like scheduled service, notify oncall of the crash, re-schedule the job with intelligent timeout, then exit.
    For web, upload details to observability platform, return 500 to user, then when possible terminate the worker.
    and so on... In practical world, unexpected errors are a thing, and good language should support them to make programmers' lives easier.
    One unfortunate downside of this ability is that some programmers abuse it, and ignore all the unknown errors instead of handling them properly - this makes a terrible user UX and introduces many bugs.
    Also, for my "web services" example, if the worker is not terminated, there is a chance the internal data structures will get corrupted, and further requests, even the ones which used to pass, will now fail. There are ways to mitigate this - ignore some exception groups but unconditionally fail on others; or use try/finally blocks and immutable data to reduce the chance of corruption even in case of unexpected exception. But this code is hard to argue about and hard to test for.
    Still, if a feature is not a good idea in some specific circumstances, it's not a reason to remove it altogether.
    burntsushi8 months ago
    I'm unsure of your point here. And I'm not getting dragged into a debate about exceptions. :-)
    edflsafoiewq8 months ago
    I think the point is very clear. Languages with exceptions report invariant violations like IndexOutOfBoundsError or AssertionFailed via the same error reporting mechanism as normal, unavoidable errors, namely by throwing exceptions.
    burntsushi8 months ago
    OK, sure. If there's a suggestion that this is better, then I wouldn't agree with that necessarily. But as I said, I don't want to get drawn into a more general discussion about exceptions. The nuance of just comparing Rust with C is barely possible to get across (see other discussion in this thread). Adding real exceptions into that mix is just a disaster lol.
    int_19h8 months ago
    Rust panics are pretty much exceptions under a different name. You can even "catch" the object passed to panic!().
    The main difference is that with exceptions, they always unwind. With panics, the person building the binary can decide whether the panic should unwind or immediately abort.
pdpi8 months ago
> Moreover, panic!()’s are hiding even in the most innocious places like unwrap() and expect() calls
unwrap and expect aren't "innocuous places where panic is hiding". The whole point of unwrap and expects is that you're effectively saying "Hitting a None/Err here is impossible, so seeing them should be treated as a correctness error".
- zamalek8 months ago
  Furthermore
  > a panic!() or an abort() are much more annoying to catch and handle.
  By design. If you're hoping to catch a panic, then you're doing it woefully wrong. Catching panics is correct only in extremely niche scenarios.
  Panic-free Rust code can sometimes be challenging, if a dependency incorrectly asserts an unwrap/expect, but the Rust community tends to hold itself to a higher standard than that.
  Other things,
  > Most of it is the low-level nitty-gritty, which Rust is not particularly good at.
  I don't feel like this was motivated anywhere, just asserted at the end. My opinion is that Rust gives you more tools to make invalid states unrepresentable - this can drastically lower cognitive burden when working on the code.
  > Bounds checks are rarely performed in the compiletime and we have to pay their price at the runtime. This is a significant performance hit for coders.
  Rust is extremely good at eliding bounds checks. If you attempt to write C++ in Rust (e.g. a moving pointer/slice) then you're probably going to trip these heuristics up, idiomatic Rust (heavy iterator usage) generally results in fewer bounds checks.
  > The ownership model is not particularly useful to coders.
  Maybe not useful, but certainly unlikely to be a hindrance. The ownership model generally maps extremely well to the CS101 concept of Input -> Processing -> Output, which coders are (and UIs aren't, if you've ever wondered why its still mostly an unsolved problem in Rust).
  You can do raw assembly in Rust too.
  - meltyness8 months ago
    Yeah you can lint for "array-access." Logic written using iterators (streams for Java folk, comprehensions for python) doesn't fail in this way. Also still gens to very fast assembly like "looping and accessing" but with the added benefit of unambiguous terminating conditions, unless you go with fallible forms, which you can use to introduce ambiguous terminating conditions. Or if some other random access pattern is needed, implement the appropriate trait.
  - SAI_Peregrinus8 months ago
    > Catching panics is correct only in extremely niche scenarios.
    Catching panics is correct when the panic would cause unwinding across a FFI boundary. I can't think of anywhere else where I wouldn't be very suspicious seeing a catch_unwind().
    int_19h8 months ago
    A good example of valid use of catch_unwind() would be for a desktop app to try to save a copy of any unsaved in-memory state before it crashes for good. Since invariants are already broken, it might or might not work, and may possibly save some garbage data, but it is much better than saving nothing at all.
    The one common thing for all such patterns is that they happen at or near the topmost layer of the app (e.g. around the main loop for GUI apps, or around the request handler for web apps).
- Aurornis8 months ago
  unwrap() and expect() are defined as panic-if functions. It’s one of the first things you learn when writing Rust code.
  Calling them innocuous or being surprised that they can panic suggests this person hasn’t really learned much Rust at all.
wrs8 months ago
Not to detract from the good points made here, but it does confuse me to see "panic!()’s are hiding even in the most innocuous places like unwrap() and expect() calls". They're not innocuous at all -- they should stand out just as much as writing panic!(). Panicking is literally unwrap and expect's only job.
aldanor8 months ago
The author seems clearly inexperienced with Rust (see the paragraph about unwraps and panics) and unfamiliar with common paradigms and ways of doing things in low-level Rust; it's unclear why this post is gaining so much attention on hn. Rust is no worse than C/C++ in writing all sorts of low level codecs, be it with or without using unsafe, with performance matching that of C/C++ and readability/safety usually being much better.
// personal experience: participated in Rust core lib float parser implementation, part of which involved porting it directly from c++; implemented safe rust qoi image encoder, bpe text encoder, etc; in all cases they turned out to be the fastest existent out there; in all cases, though, good prior experience with low-level rust was required, especially if doing it without unsafe and trying to avoid most bound checks, so there's that.
hlieberman8 months ago
This kind of hyper-specific need (codecs) is probably better served by a specialist language, like Whuffs (https://github.com/google/wuffs). You don't need, or want, the level of expressiveness that comes with something like Rust, but on the other hand, it's a compact enough problem set that you're willing to spend extra development work to eke out every bit of speed.
- vlovich1238 months ago
  Was going to make the same comment. And in the article they note that they have to go own to assembly to do this anyway and I find the inline assembly with Rust to be more ergonomic and safer than the assembly facilities you get with C++ (not that I’ve done that much assembly to be fair).
  - jchw8 months ago
    I agree, Rust seems a lot better for inline assembly. For C and C++, there's too much variability between compilers for what you can actually use with inline assembly. Right now, I'm desperately waiting for #[naked] to be stable[1][2]. It's not always necessary, but it's incredibly useful for a lot of low level fuckery. Until it's there, there aren't many good alternatives; you could have a custom build script that calls out to an external assembler, I suppose.
    [1]: https://github.com/rust-lang/reference/pull/1689
    [2]: https://github.com/rust-lang/rust/pull/134213
    wakawaka288 months ago
    Can you really complain about differences between compilers, when Rust basically has ONE compiler? You would most likely have differences if there were more Rust compilers out there. I mean, at least with C++ there are choices, and you can choose to just use one compiler like clang if the differences bother you.
    jchw8 months ago
    Well... yes. I can definitely complain about things that are a result of different philosophies or realities for a given programming environment. In fact, I'd argue the reality of using a given programming language is the only thing that actually matters; whatever is written in specs and whatever is theoretically possible is of very little interest to most developers.
    The reality today is this: If I want to deploy C++ code on Windows, MSVC or possibly Clang is for sure my best choice. If I want to integrate with Linux distributions, supporting GCC is basically mandatory. And on Apple platforms, it's Clang all the way. And of course, various BSDs will either prefer GCC or Clang. (I am guessing it's mostly Clang these days. It has been a while since I have used any BSDs.)
    That means that if I want to write cross-platform software for modern desktop and server operating systems, I have to keep all of this in mind.
    If you couldn't complain about this, then would it be fair to go and complain about the fact that Rust has only a single compiler? I'd argue it is fair to complain that Rust only has a single compiler, and personally support having a second more-or-less complete Rust frontend. And on that note, I am looking forward to gccrs, which I hope will eventually bring Rust to some more places in addition to hopefully cutting down on the amount of Rust things that are the way they are just because rustc does them that way.
    wakawaka288 months ago
    >The reality today is this: If I want to deploy C++ code on Windows, MSVC or possibly Clang is for sure my best choice. If I want to integrate with Linux distributions, supporting GCC is basically mandatory. And on Apple platforms, it's Clang all the way. And of course, various BSDs will either prefer GCC or Clang. (I am guessing it's mostly Clang these days. It has been a while since I have used any BSDs.)
    Clang works everywhere: Mac, Windows, Linux, and even BSD. As a matter of fact so does GCC. You might complain about troubles linking, say, Windows-specific libraries without MSVC. But I know you'd have it just as bad or worse trying to link Rust code to those same libraries.
    >If you couldn't complain about this, then would it be fair to go and complain about the fact that Rust has only a single compiler?
    It depends on which of the several advantages of multiple compilers you actually care about. Some make faster output, some are more hackable, some have better licenses (which is subjective), and some have better commercial product support.
    >And on that note, I am looking forward to gccrs, which I hope will eventually bring Rust to some more places in addition to hopefully cutting down on the amount of Rust things that are the way they are just because rustc does them that way.
    So you're saying that soon you'll be able to complain about inconsistencies between Rust compilers too, lol...
    jchw8 months ago
    > Clang works everywhere: Mac, Windows, Linux, and even BSD. As a matter of fact so does GCC. You might complain about troubles linking, say, Windows-specific libraries without MSVC. But I know you'd have it just as bad or worse trying to link Rust code to those same libraries.
    OK, fine, so if you limit your code to Clang, you can use the GCC asm syntax. Now it is possible to do inline asm everywhere that Clang supports. It is still the crummy GCC inline asm syntax, which has pretty poor ergonomics compared to Rust.
    I wouldn't ever do this, but it can be done. The tradeoff for a relatively bad inline asm syntax doesn't seem worth it, versus just using some external assembler.
    > It depends on which of the several advantages of multiple compilers you actually care about. Some make faster output, some are more hackable, some have better licenses (which is subjective), and some have better commercial product support.
    Sure.
    > So you're saying that soon you'll be able to complain about inconsistencies between Rust compilers too, lol...
    The problem with C++ is that there isn't a standard for inline assembler and never will be.
    Here is the Rust standard for inline assembler:
    https://doc.rust-lang.org/reference/inline-assembly.html
    If gccrs implements it, it will work just as well. I'm sure there will be some inconsistencies between the exact assembler syntax allowed across toolchains, but that's OK: it's all stuff that can be ironed out. With C and C++, it will not be ironed out. It's just going to be how it is today for all of eternity.
    wakawaka288 months ago
    I thought of another reason for C++ to do this. C++ compilers allow you to customize the assembler that you use for your code. As long as that is the case, it is impossible to mandate uniform syntax. The language certainly can't standardize externally-specified code either. I bet you can't customize this in Rust.
    Different assemblers, even for the same arch, support different features and instructions, and may use different syntax. So requiring uniformity is a non-starter.
    >With C and C++, it will not be ironed out. It's just going to be how it is today for all of eternity.
    I don't think that's true. If it is, then I guess it's a sign that the big players don't think this is an important issue. And they are the ones writing the most inline assembly, so they ought to know what is and isn't actually worth it.
    vlovich1238 months ago
    > As long as that is the case, it is impossible to mandate uniform syntax. The language certainly can't standardize externally-specified code either. I bet you can't customize this in Rust. Different assemblers, even for the same arch, support different features and instructions, and may use different syntax. So requiring uniformity is a non-starter.
    You’d be wrong. You can customize the build however you want by defining a build.rs file. For inline assembly I don’t see a problem with uniformity and not supporting weird shit. Weird shit should be harder if it makes the more straightforward stuff easier and less error prone.
    wakawaka288 months ago
    If you customize the assembler, then I think that the inline assembly can't support uniform syntax. The syntax required by an assembler is defined by the assembler, not some language trying to inline it. The true state of things might be even more complicated, with C++ and Rust compilers choosing to parse assembly. But if they do that, it is necessarily going to interfere with using all of the features of the underlying customizable assembler.
    vlovich1238 months ago
    You can use the assembler you want for standalone assembly files which is why I referenced build.rs. In inline assembly you will always be using the one assembler Rust supports on that platform.
    jchw8 months ago
    > I don't think that's true. If it is, then I guess it's a sign that the big players don't think this is an important issue. And they are the ones writing the most inline assembly, so they ought to know what is and isn't actually worth it.
    I'm not even going to attempt to go into the utter dysfunction that is the C++ standards committee, but I'll just say this: whatever I could say to convince you that it sucks, it's significantly worse than that. Trust me, the C++ standards committee refusing to address something is not a sign that there is not a problem. The reason why inline assembly will never be standardized is because that's a relatively small problem, whereas the C++ world today is full of gaping holes that the standard is utterly failing at filling. From concepts to modules, it's a shit show. The "big players" are slowly leaving. Google and Microsoft may have some of the biggest C++ codebases on Earth, and they are currently busy investing elsewhere, with Rust, Go, Carbon, and more.
    wakawaka288 months ago
    >Google and Microsoft may have some of the biggest C++ codebases on Earth, and they are currently busy investing elsewhere, with Rust, Go, Carbon, and more.
    I think this is overstated, and may also be construed as an attempt to monopolize and destroy what is a very successful open technology spec. I know in the case of Google especially, there were many people who got into a spat with the rest of the committee because they had a different vision of what was appropriate for the language. That is a sign that the committee is functioning correctly. It's supposed to prevent a single actor from ignorantly breaking stuff for others. You might disagree with the particular decision that was made, but I think the committee is rarely given the benefit of the doubt that it deserves.
    >The reason why inline assembly will never be standardized is because that's a relatively small problem, whereas the C++ world today is full of gaping holes that the standard is utterly failing at filling. From concepts to modules, it's a shit show.
    Concepts are usable today. Modules are basically usable but immature. C++ needs to be cut some slack when it comes to bleeding edge features. Other languages definitely are, and they make little in the way of compatibility commitments like C++ does. I think C++ should publish the standards after the features have been implemented for a while, but that is just a naive outsider's opinion. Every decision that could be made for this stuff has tradeoffs.
    As I said elsewhere, inline assembly syntax can't be standardized without an associated assembler, which is platform-dependent and often customizable. I also think the language spec should know as little about the architecture as it can, because each one has slightly different characteristics.
    vlovich1238 months ago
    > Concepts are usable today. Modules are basically usable but immature.
    Concepts are usable in the sense they compile. Claiming they are usable in terms of being ergonomic and that people are willing to use them outside the stdlib is a stretch. You think it seems reasonable until you encounter Rust traits and then you wonder wtf is C++ doing.
    As for modules, it’s now 5 years since standardization. How much more time does a basic feature like that take to mature? Btw, the community provided feedback to the standards committee that the spec was useless for anyone building build systems tooling around it and the committee chose to ignore that warning and this is the result.
    > I know in the case of Google especially, there were many people who got into a spat with the rest of the committee because they had a different vision of what was appropriate for the language. That is a sign that the committee is functioning correctly. It's supposed to prevent a single actor from ignorantly breaking stuff for others. You might disagree with the particular decision that was made, but I think the committee is rarely given the benefit of the doubt that it deserves.
    The committee was actually given a lot of benefit of the doubt after c++11 because they promised to change. They’ve squandered it.
    wakawaka288 months ago
    >Concepts are usable in the sense they compile. Claiming they are usable in terms of being ergonomic and that people are willing to use them outside the stdlib is a stretch. You think it seems reasonable until you encounter Rust traits and then you wonder wtf is C++ doing.
    Concepts work great. The primary purpose of them is to allow things to compile or not and to deliver readable error messages when a constraint is violated. I use them from time to time at work. I don't know about Rust traits but I do know that C++ has many useful paradigms and idioms to handle a variety of sticky situations.
    >As for modules, it’s now 5 years since standardization. How much more time does a basic feature like that take to mature?
    It's not as basic as you imagine, evidently. If this was any other language, the one true language authority would start building an implementation with a spec that is constantly in flux, and it could take just as long to complete. Alternatively, they'd break compatibility and shrug off the hundreds of man-years of work they generated downstream.
    >Btw, the community provided feedback to the standards committee that the spec was useless for anyone building build systems tooling around it and the committee chose to ignore that warning and this is the result.
    I think this means the committee sees it as someone else's job to develop the implementation details for modules. They also don't specify things such as, how shared libraries should be built or loaded, or the format of binary code.
    >The committee was actually given a lot of benefit of the doubt after c++11 because they promised to change. They’ve squandered it.
    They did change. We are getting regular updates and corrections now. I think the committee is more open to proposals than ever, perhaps too open. I can hardly keep up with all the cool stuff they add every couple of years.
    vlovich1238 months ago
    > It's not as basic as you imagine, evidently. If this was any other language, the one true language authority would start building an implementation with a spec that is constantly in flux, and it could take just as long to complete. Alternatively, they'd break compatibility and shrug off the hundreds of man-years of work they generated downstream.
    I don’t think that’s the reason. The issue isn’t modules themselves. They’re imperfect but no solution was going to be. The hostility to defining things that would make them usable resulted in them being unusable. An unusable feature is as good as one that doesn’t exist.
    > They did change. We are getting regular updates and corrections now. I think the committee is more open to proposals than ever, perhaps too open. I can hardly keep up with all the cool stuff they add every couple of years.
    They dick around forever and the meaningful changes are ones that aren’t really the big pain points. And when they try to solve meaningful pain points (eg ranges) they end up doing such a piss poor job that it ends up being overly complex and solving the original problem poorly. C++ as a language has utterly failed. That’s why standards body participants like Herb and Channing are trying to come ups it’s their own successor. If they thought it was solvable within the standards body they would have.
    dmoy8 months ago
    I read it as
    > I agree, Rust seems a lot better for inline assembly [because there's basically only one compiler]. [Compared to] C and C++, [where] there's too much variability between compilers for what you can actually use with inline assembly
    jmillikin8 months ago
    Rust's inline assembly syntax is part of the language, and in principle the same Rust source would compile on any conforming compiler (rustc, gccrs).
    C/C++ doesn't have a standard syntax for inline assembly. Clang and GCC have extensions for it, with compiler-specific behavior and syntax.
    wakawaka288 months ago
    I mentioned somewhere else but I might as well mention here too: there is no standard assembler that everyone uses. Each one may have a slightly different syntax, even for the same arch, and at least some C++ compilers allow you to customize the assembler used during compilation. Therefore, one would assume that inline assembly can't be uniform in general, without picking a single assembler (even assembler version) for each arch.
    jmillikin8 months ago
    You're talking about the syntax of the assembly code itself. In practice small variations between assemblers isn't much of a problem for inline assembly in the same way it would be for standalone .s sources, because inline assembly rarely has implementation-specific directives and macros and such. It's not like the MASM vs NASM split.
    This thread is about the compiler-specific syntax used to indicate the boundary between C and assembly and the ABI of the assembly block (register ins/outs/clobbers). Take a look at the documentation for MSVC vs GCC:
    https://learn.microsoft.com/en-us/cpp/assembler/inline/asm?v...
    https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
    Rust specifies the inline assembly syntax at https://doc.rust-lang.org/reference/inline-assembly.html in great detail. It's not a rustc extension, it's part of the Rust language spec.
    wakawaka288 months ago
    >This thread is about the compiler-specific syntax used to indicate the boundary between C and assembly and the ABI of the assembly block (register ins/outs/clobbers).
    I see... Nevertheless, this is a really weird issue to get bent out of shape over. How many people are really writing so much inline assembly and also needing to support multiple compilers with incompatible syntax?
    jmillikin8 months ago
    Biggest category of libraries that need inline assembly with compiler portability are compression/decompression codecs (like the linked article) -- think of images (PNG, JPEG), audio (MP3, Opus, FLAC), video (MPEG4, H.264, AV1).
    Also important is cryptography, where inline assembly provides more deterministic performance than compiler-generated instructions.
    Compiler intrinsics can get you pretty far, but sometimes dropping down to assembly is the only solution. In those times, inline assembly can be more ergonomic than separate .s source files.
    vlovich1238 months ago
    Exactly. It picks a single assembler:
    > Currently, all supported targets follow the assembly code syntax used by LLVM’s internal assembler which usually corresponds to that of the GNU assembler (GAS)
    Uniformity like that is a good thing when you need to ensure that your code compiles consistently in a supported manner forever. Swapping out assemblers isn’t helpful for inline assembly.
    jmillikin8 months ago
    The quoted statement is weaker than what you're reading it as, I think. It's not a statement that emitted assembly code is guaranteed to conform to LLVM syntax, it's just noting that (1) at present, (2) for supported targets of the rustc implementation, the emitted assembly uses LLVM syntax.
    Non-LLVM compilers like gccrs could support platforms that LLVM doesn't, which means the assembly syntax they emit would definitionally be non-LLVM. And even for platforms supported by both backends, gccrs might choose to emit GNU syntax.
    Note also that using a non-builtin assembler is sometimes necessary for niche platforms, like if you've got a target CPU that is "MIPS plus custom SIMD instructions" or whatever.
    estebank8 months ago
    I didn't follow up the stabilization process very closely, but I believe you're wrong. What you're describing is what used to be asm! and is now llvm_asm!. The current stable asm! syntax actually parses its own assembly instead of passing it through to the backend unchanged. This was done explicitly to allow for non-llvm backends to work, and for alternative front-ends to be able to be compatible. I saw multiple statements on this thread about alternative compilers or backends causing trouble here, and that's just not the case given the design was delayed for ages until those issues could be addressed.
    Given that not all platforms that are supported by rust have currently support for asm!, I believe your last paragraph does still apply.
    https://rust-lang.github.io/rfcs/2873-inline-asm.html
    jmillikin8 months ago
    This sentence from the Reference is important:
    > The exact assembly code syntax is target-specific and opaque to the compiler > except for the way operands are substituted into the template string to form > the code passed to the assembler.
    You can verify that rustc doesn't validate the contents of asm!() by telling it to emit the raw LLVM IR:
    % cat bogus.rs #![no_std] pub unsafe fn bogus_fn() { core::arch::asm!(".bogus"); core::arch::asm!("bogus"); } % rustc --crate-type=lib -C panic=abort --emit=llvm-ir -o bogus.ll bogus.rs % cat bogus.ll [...] ; bogus::bogus_fn ; Function Attrs: nounwind define void @_ZN5bogus8bogus_fn17h0e38c0ae539c227fE() unnamed_addr #0 { start: call void asm sideeffect alignstack ".bogus", "~{cc},~{memory}"(), !srcloc !2 call void asm sideeffect alignstack "bogus", "~{cc},~{memory}"(), !srcloc !3 ret void }
    That IR is going to get passed to llvm-as and possibly onward to an external assembler, which is where the actual validation of instruction mnemonics and assembler directives happens.
    ---
    The difference between llvm_asm!() and asm!() is in the syntax of the stuff outside of the instructions/directives -- LLVM's "~{cc},~{memory}" is what llvm_asm!() accepts more-or-less directly, and asm!() generates from backend-independent syntax.
    I have an example on my blog of calling Linux syscalls via inline assembly in C, LLVM IR, and Rust. Reading it might help clarify the boundary: https://john-millikin.com/unix-syscalls#inline-assembly
    vlovich1238 months ago
    Assembly by definition is platform specific. The issue isn’t that it’s the same syntax on every platform but that it’s a single standardized syntax on each platform.
    wakawaka288 months ago
    I understood it that way too. I just expect that if there were more Rust compilers (a benefit which C++ has in spades) then there would most likely be many annoying differences between them as well. There isn't an ISO standard for Rust. For that matter I guess most programming languages with multiple implementations have basically the same pro and con: there's more than one way to do things.
    jmillikin8 months ago
    Note that becoming an international standard (via ISO, ECMA, IETF, or whatever) isn't necessary or sufficient to avoid dialects.
    If the Rust language specification is precise enough to avoid disagreements about intended behavior, then multiple compilers can be written against that spec and they can all be expected to correctly compile Rust source code to equivalent output. Even if no international standards body has signed off on it.
    On the other hand, if the spec is incomplete or underspecified, then even an ANSI/ISO/IETF stamp of approval won't help bring different implementations into alignment. C/C++ has been an ISO standard for >30 years and it's still difficult to write non-trivial codebases that can compile without modification on MSVC, GCC, Clang, and ICC because the specified (= portable) part of the language is too small to use exclusively.
    Or hell, look at JSON, it's tiny and been standardized by the IETF but good luck getting consistent parsing of numeric values.
    vlovich1238 months ago
    You see it as a benefit, I see it as ridiculously user hostile. Porting your code to a new platform isn’t just “implement new APIs” it’s also “adjust your usage of the language to the dialect this vendor understands“. There is no benefit whatsoever to the end user and ecosystem of the language to having multiple frontends to contend with.
    I’m all for multiple backends but there should be only 1 frontend. That’s why I hope gccrs remains forever a research project - it’s useful to help the Rust language people find holes in the spec but if it ever escapes the lab expect Rust to pick up C++ disease. Rust with a gcc backend is fine for when you want gcc platform support - a duplicate frontend with its own quirks serves no purpose.
    I also hope Rust never moves to an ISO standard for similar reasons. As someone who has participated in an ISO committee (not language) it was a complete and utter shitshow and a giant waste of time taking forever to get simple things done.
    jmillikin8 months ago
    > I’m all for multiple backends but there should be only 1 frontend. That’s > why I hope gccrs remains forever a research project - it’s useful to help > the Rust language people find holes in the spec but if it ever escapes the > lab expect Rust to pick up C++ disease.
    An important difference between Rust and C++ is that Rust maintains a distinction between stable and unstable features, with unstable features requiring a special toolchain and compiler pragma to use. The gccrs developers have said on record that they want to avoid creating a GNU dialect of Rust, so presumably their plan is to either have no gccrs-specific features at all, or to put such features behind an unstable #![feature] pragma.
    > Rust with a gcc backend is fine for when you want gcc platform support > - a duplicate frontend with its own quirks serves no purpose.
    A GCC-based Rust frontend would reduce the friction needed to adopt Rust in existing large projects. The Linux kernel is a great example, many of the Linux kernel devs don't want a hard dependency on LLVM, so they're not willing to accept Rust into their part of the tree until GCC can compile it.
    vlovich1238 months ago
    Dialects are created not just because of different feature sets, but also because of different interpretations of the spec / bugs. Similarly, if Rust adds a feature, it’ll take time for gccrs to port that feature - that’s a dialect or Rust becomes a negotiation of getting gccrs to adopt the feature unless you really think gccrs will follow the Rust compiler with the same set of features implemented in a version (ie tightly coupled release cycles). It’s irrelevant of the intentions - that’s going to be the outcome.
    > A GCC-based Rust frontend would reduce the friction needed to adopt Rust in existing large projects. The Linux kernel is a great example, many of the Linux kernel devs don't want a hard dependency on LLVM, so they're not willing to accept Rust into their part of the tree until GCC can compile it.
    How is that use case not addressed by rust_codegen_gcc? That seems like a much more useful effort for the broader community to focus on that delivers the benefits of gcc without bifurcating the frontend.
    kibwen8 months ago
    > Right now, I'm desperately waiting for #[naked] to be stable
    Does the global_asm! macro suffice for your use case?
    jchw8 months ago
    Honestly, I didn't think of that, but I can't really think of a single reason it wouldn't work. Thank you!
    pwdisswordfishz8 months ago
    Name mangling.
- adastra228 months ago
  Thanks for linking to Wuffs! Hadn’t seen it before.
unrealhoang8 months ago
I think tfa hasn’t considered the advantages of enforcing the correct usage of the codecs library. Most of the time memory safety issues are caused not in the codecs code itself but the user code using the codecs incorrectly.
Unsafe Rust is still improving on its ergonomics (handling uninitialized memory) but the current capacity should be enough to implement anything.
- jchw8 months ago
  To be fair, you definitely don't need a Rust implementation to get that advantage: a simple wrapper will do the trick.
maartenscholl8 months ago
The author claims to be an expert in C++ but begins the article with a detour that incorrectly states that static and dynamic dispatch are necessarily orthogonal in C++. In reality you can do both within the same hierarchy using CRTP combined with a virtual base method. Compile-time inlining eliminates the overhead associated with the virtual call, so you can match Rusts flexibility with the same performance.
- wakawaka288 months ago
  The cases where you can inline a virtual method seem narrow. So narrow in fact, that if it is possible then you should probably only be doing CRTP or simple static dispatch. See: https://stackoverflow.com/questions/733737/are-inline-virtua...
  If you try to use CRTP + virtual on polymorphic types then one has to wonder if it will work as intended for both use cases (when used as a static object or a polymorphic one).
  I'm not the absolute most expert C++ programmer, but I'm no noob. The idea of deliberately introducing dynamic polymorphism only to try to optimize it out seems like a bad idea. It's unnecessarily complicated and confusing. If you want to go fast just use CRTP straight up and forget all about dynamic dispatch and potential cute optimizations.
  Edit: I think this explains my objection: https://www.codeproject.com/Tips/537606/Cplusplus-Prefer-Cur... (I found that in the StackOverflow post I linked to above.)
- klysm8 months ago
  I feel like expert in C++ is a wide range. There are corners of C++ that probably 5 people know about total
haberman8 months ago
> But such behaviour is still unacceptable from a library perspective: a library should never, ever call abort or otherwise terminate the program.
I tend to agree with this, which is why I was happy to discover that No-Panic Rust does appear to be practical: https://blog.reverberate.org/2025/02/03/no-panic-rust.html
devit8 months ago
The article is wrong on the claim that Rust panic are harder to catch than C++ exceptions: as long as you don't configure panic=abort you can catch them easily with catch_panic and they are generally implemented using the same runtime mechanism (i.e. Rust panics usually effectively are C++ exceptions for most purposes).
- 01HNNWZ0MV43FF8 months ago
  Hm, it just feels wrong though. Panics feel bigger than exceptions
  - int_19h8 months ago
    That's a cultural thing (and a good one, too).
    The one big difference tho is that in Rust, the end user of the library - i.e. the person compiling the binary of which this library is a part - can decide at that point whether panics unwind like C++ exceptions, or just abort immediately. Conversely, this means that the library should never assume that it can catch panics, even its own internal ones, because it may be compiled with panic=abort.
    So it's kinda like C++ exceptions, but libraries can only throw, never catch.
zanellato197 months ago
>In my mind, a more advanced memory-safe system will likely overtake Rust and become adequate for the task of implementing codecs and other low level tools (perhaps integrated with a theorem prover and a proof assistant that lets the programmer clarify the assertions, preconditions, postconditions and invariants). However, I will stick with C for now. Ultimately, a lot of this boils down to personal preference, so try to not harass strangers online for their choices.
I seen this in basically every "I prefer C for this use case" blog post and its always unconvincing, because C is much worse for safety then Rust, so I always think the author would find other problems with these next tools because its just goal posts moving.
davidhyde8 months ago
When writing codecs in Rust I found the Iterator trait (and all its provided methods) to be very useful for avoiding panics relating to out of bounds indexing into arrays. It also tends to generate very efficient code since bounds checks are done on the order of once per loop rather than once per loop iteration. And then there is the powerful itertools crate that gives you even more capabilities. Highly recommended!
oneup28 months ago
I feel like this is kind of obvious. Plenty of people write C or C++ and then optimize some low-level function using hand coded assembly. Even the author seemed to suggest they couldn't get their C compiler to do what they wanted so they bailed and hand wrote assembly. How is this any different from rust?
I'd expect the same for rust. You make a safe higher level API. It validates all the inputs and then calls into the private "unsafe parts" implemented in C or assembly. Hopefully you've put enough validation in the high level API so that it will be unlikely to call the low-level API in a way that breaks.
pjmlp8 months ago
While it won't fix all the security issues that C++ might have, given its C subset, the provided examples can be improved by enabling the hardened runtime configuration, built-in compiler analysis, which also validate use before initialisation (finally fixed in C++26).
Rewriting is not always an option, so improving one's code security is also something to be aware of.
Having said this and since the thread is about text handling, Microsoft has rewriten DWriteCore in Rust.
JJOmeo8 months ago
I had a similar experience writing image decompressors. There wasn't enough to internally manage for rust to provide any benefit, and the gritty fast algorithmic code is just simpler to do with C++. C wasn't going to get me the flexibility and generic patterns I was going for.
- jgord8 months ago
  what %ge of C++ features did you use for that ?
koakuma-chan8 months ago
Ariel listen to me: compile-time memory safety and robust type system? It's a mess. Programming in C is better than anything they got over there. The syntax might seem much sweeter where reference lifetimes play, but frills like the borrow checker will only get in the way.
- burjui7 months ago
  ... in the way of writing code with undefined behaviour. "You don't tell me how to live! Maybe I want to shoot myself in the foot sometimes! I already shot myself in the head, and it was awesome, instantly made me an old fart that refuses to learn. Ignorance is bliss!"
jgord8 months ago
The first section on rust protections static vs dynamic is thought provoking and informative...
tbf, author admits he needed to dive into ASM to work around the C compilers sensible optimisation defaults.
tbf to the author, he does explain that the domain of codecs implementation is niche.
great read ... in the best traditions of the Mike Abrash articles of yesteryear.
- jgord8 months ago
  Glad to be proven wrong in my subconscious assumptions about authors age and sex .. I assumed some bearded 50yo polyglot who came up in the 80s writing TSRs in A86.
  About page does not disappoint.
ydjje8 months ago
[dead]
waltercool8 months ago
[dead]
patrick4518 months ago
> We have observed a ~13.3% slow-down, which is significant enough to warrant a second thought on whether the bounds checks were a good idea in the first place.
Wow. I will think twice the next time I reach for .at().
- elchananHaas8 months ago
  Across an application the penalty is normally 1-5%. Most business code benefits from the increased safety. Parsers are an exception, but the large attack surface sometimes makes it a good idea there too.
  - TinkersW8 months ago
    If you are quoting the numbers google put out, recall that those numbers were after they profiled and manually disabled bounds checks in cases where it caused significant slowdown.. so in other words the numbers were garbage.
  - int_19h8 months ago
    The bigger problem is that std::vector::at() is kinda useless in any case since it's very rare in idiomatic C++ to be indexing into a vector using an integer. You're much more likely to be using an iterator, since that's what you'll get from standard algorithms like std::find() etc. And there's no API for "checked iterators" that is equivalent to at(). For a debug build, your STL implementation might use checked iterators, but that's a quality of implementation matter that you can't rely on.
    MSVC does have a documented checked iterator facility that can be enabled even in release builds: https://learn.microsoft.com/en-us/cpp/standard-library/check....
- imtringued8 months ago
  The antivirus packages I've been forced to use probably wasted more time than those 13% will ever save.