Three ways formally verified code can go wrong in practice(buttondown.com)

184 pointsby todsacerdoti4 months ago18 comments

rook_line_sinkr4 months ago
I got told to use these words back in uni
verification - Are we building the software right?
validation - Are we building the right software?
makes many a thing easier to talk about at work
- moandcompany4 months ago
  Yep... Along with this
  Verification aligns with a specification. E.g. verify if what was implemented matches what was specified.
  Validation aligns with a requirement. E.g. verify if what was implemented solves the actual problem.
  - tirant4 months ago
    I like to talk about goals and not requirements.
    What is your product goal? Both for the customer (solving a problem) and for the company(solve a marketing need, make money)?
    Can you proof that you are meeting both? Then you are validating your goals successfully.
    Most of the time products are missing defined goals, so lots of people have no idea what the fuck they are developing and everyone ends up with their own idea of the product.
  - friendzis4 months ago
    Engineering specifications implement requirements.
    EDIT: more formally, specification is a document stating requirements
    ijustlovemath4 months ago
    Usually there's a separate document that has the requirements, and from that document you have a "software detailed design" which has the specifications for how to build the software such that it upholds the requirements. Subtle but important difference.
- 01HNNWZ0MV43FF4 months ago
  If it's hard to remember which is which, maybe they should be different words.
  Like "Customer validation" and "Verification against the spec"
  Like "sympathy" and "empathy". I finally heard a mnemonic but if I need a mnemonic maybe it's a sign that the words are just too hard to understand
  - RHSeeger4 months ago
    I just don't bother to shorten it...
    - Did we correctly understand and build what the customer asked for
    - Did the customer actually ask for what would solve the problem they are trying to solve
    The second one is very important to ask early on, and _not_ asking it can cause a lot of wasted time. When I'm in a position to comment on whether I think someone fits the requirements of a senior developer, whether or not they ask that question plays directly into it.
  - smokel4 months ago
    Etymology to the rescue. "Valid" and "value" share a root, and so do "verify" and "verity." You validate something to see if it has value, meaning it works and is fit for purpose. You verify it to check that it is true and meets its specifications. Does that help? :)
- dapperdrake4 months ago
  There is an important and distinct pair of definitions used by a possibly smaller but significant number of people:
  Verification: formal theoretical proof
  Validation: Empirical test based approach
- tirant4 months ago
  Just by reading the headlines I immediately suspected the topic of Verification vs Validation would be involved. I still cannot comprehend why it is still such a gap in many software projects I have worked in. Everyone knows about testing but barely a few understand or want to understand why Validation is equally important.
- friendzis4 months ago
  While this is a good general rule of thumb, this terminology has nothing to do with how these terms are formally defined.
  Verification first and foremost concerns design and development stage, a formal definition would be something like "outputs of design and development step meet specified requirements for that step". For example, you could verify that a formal specification for a module actually implements imposed requirements. Especially in software this can get murky as you cover different phases with different tests/suites, e.g. unit, integration, e2e implicitly test different stages, yet are often part of the same testing run.
  Validation first and foremost concerns the whole system/product and fitness for market availability.
  For example you would verify that e.g. for a motor vehicle ABS functions correctly, airbags deploy in a crash and frame meets passenger safety requirements in a crash, and you would still not be able to sell such vehicle. You would validate the vehicle as a whole with corresponding regulatory body to get vehicle deemed fit for market.
  TLDR: Verification is getting passing grade in a lab test, validation is submitting all relevant lab test protocols to get CE certified.
- armanboyaci4 months ago
  Should we validate before we verify the software?
  - 8n4vidtmkvmk4 months ago
    Yes.
xlii4 months ago
There's this place for formal proving that most people won't get to and think it's stupid. I was such person some time ago, but then started to work on the system that: Required 2-3 months for implementation, had no room for error, made processing mistakes cost in thousands of Euros (even silly things). Than you start to wonder "hmmm... how can I use something better than just good practices and judgement? How can I _prove_ design A works".
And usually there are tangible problems to avoid: Ensuring that money won't get lost in transaction. Making sure that legal paperwork will be delivered or returned on time. Or that CSV file costing arm and leg for a single row won't have more rows than necessary. Many of these problems are borderline paradoxical. It's known that (as with Two Generals) there are no complete solutions. Usually it's like playing a catch with reality. A game that goes "what happens if we duplicate money mid-flight?" - "which system is source of truth in case of fraud", "do we rather risk false negative or accept MIA state" etc.
Sure, it's fun to model a crossroad with streetlights and a chicken just for kicks, but the main "why?" for formal proving is having "this design is good because I've seen the proof of it" moment.
But what kind of questions are asked is just as important. I can bet asbestos met all the standards it was checked against :)
- butlike4 months ago
  You're touching on a timeless approach which is currently not possible, given the limits of reality. You'd need to know for a fact all potential outlays in order to make sure every change your program (creation) creates is accounted for and proved against. Since you do not have perfect information, your proof, by current definition, cannot be proved to be perfect.
  - xlii4 months ago
    That's true, just as I cannot possibly measure car speed given innate measuring error which is not a defense from getting a speeding ticket.
    Formal proving isn't about getting to "my model is reality" but having step above mere imagination or educated guess. That's why it's possible to model paradoxical models. Given N systems they all can fail at the same time and there's nothing that can be done to guarantee message RECV+ACK. So at some point there's invariant saying "ok, we say that 3 out of 5 systems will always work". If you break invariant all hell is loose but at that point many weird stuff might happen.
    Usually when one needs to think about message passing in a way like [0] formal proving can save lives (or bottoms, depending on the context).
    [0]: https://xlii.space/eng/network-scenarios/
js84 months ago
Formal verification is ultimately an epistemological problem, how do we know that our model of reality corresponds to reality?
People have commented that validation is different from verification, but I think validation can be done by formally specifying the environment the program runs in.
For example, the question whether user can do X in your software corresponds to a question, is there a sequence of user inputs such that the software plus operating system leads to the output X.
By including the environment (like the operating system) into the formal spec we could answer (in theory) such questions.
Even if the question is fuzzy, we can today formalize it with AI models. For example, if we ask, does paint program allow user to draw a rose? We can take a definition of a rose from a neural network that recognizes what a rose is, and formally verify the system of paint program and rose verifier given by the NN.
- WJW4 months ago
  > We can take a definition of a rose from a neural network that recognizes what a rose is
  Can we? That just shifts the problem to building such a NN.
  I think it would already be pretty tricky to get a decent sized panel of humans to agree on what a "rose" is. Do rose-shaped but yellow flowers count? How about those genetically altered thornless ones? How about perfectly imitatations that are made of plastic? Images of roses? The flowers from the list at https://www.epicgardening.com/rose-lookalikes/?
  And of course there's the problem of getting a NN to recognise just roses. The mythical neural network with 0% error rate is exceptionally rare and usually not very useful for real world applications. I very much doubt humanity could build a neural network that perfectly recognises roses and only roses.
  - js84 months ago
    > The mythical neural network with 0% error rate is exceptionally rare and usually not very useful for real world applications.
    I think we differ in how we see the formal methods. IMHO they are not supposed to give you 100% guarantees, because you can always cast doubt over your assumptions. Rather, I think they're a practical tool that give you a better guarantee; so in this context, a paint program verified using NN's definition of a rose is better than one that has been verified only on couple examples of a rose.
    Another way to look at it - there is Curry-Howard isomorphism which states that (idealized) execution of a computer program is the same as doing logical inference, i.e. proving a formal statement. So if you're calculating 2+2, you're effectively proving that 2+2=4.
    This means all software testing is a special case of a formal method. For example, when you run the test suite of your program with a new Java runtime, you are proving a statement, on these inputs my program will give the expected output, assuming we run with the new runtime. So the new runtime becomes the assumption (about the environment program runs in).
    My point is, people already use formal methods, they just don't call them that. It's natural then to ask, instead of testing on a few input examples, can we instead verify a larger set of inputs (defined intensionally not extensionally)? And that's how you come from a few examples of a rose picture (extensional definition) to a definition given by an NN (intensional definition), which encompasses too-many-to-count rose pictures.
    The NN doesn't have to be perfect. So what, you verified in your paint program one can also paint some pictures of not-quite-roses. Who cares? It's better verified overall.
- ThreeFx4 months ago
  Now you’ve just shifted the problem to a much harder statement: proving the things the NN recognizes as a rose is actually a rose.
  You cannot get rid of the enviroment, and the best thing you can do is be explicit in what you rely on. For example, formally verified operating systems often rely on hardware correctness, even though the errata sheets for modern CPUs are hundreds of pages long.
  - js84 months ago
    Yeah. That's why I am saying it's the epistemological problem.
    In the above example, that NN recognizes rose as a rose is an assumption that it is correct as a model of the (part of the) world. On that assumption, we get a formal definition of "an image of a rose in the pixelated image", and we use that to formally prove our system allows roses to be painted.
    But getting rid of that assumption is I believe epistemologically impossible; you have to assume some other correspondence with reality.
    I agree with you on the hardware. I think the biggest obstacle to software formal correctness is the lack of formal models of which we can be confident describe our environment. (One obstacle is the IP laws - companies do not like to share the models of things they produce.)
  - lstodd4 months ago
    Right, the problem is that one can't formally describe the environment, and any one thing has to interact with it. So formallness goes right out the window.
- empath754 months ago
  Formal verification is a pure logic problem. You are making a proof, and a proof is just a demonstration that you can reach a given expression given certain axioms and the rules of formal logic.
  You can't just chuck a neural network into the mix because they aren't formally verified to do basically anything beyond matrix multiplication and the fact that they are universal approximators.
- netdevphoenix4 months ago
  > how do we know that our model of reality corresponds to reality?
  This is the main issue I have with formal methods in commercial settings. Most of the time, the issue is that the model reality map is far from accurate. If you have to rewrite your verification logic every time you have to update your tests, development would go very slowly
- butlike4 months ago
  Definition by consensus doesn't always equal absolute definition, to continue your epistemological metaphor. Just because the consensus states it's a rose doesn't mean it absolutely is in the truest sense (the artist, program, and neural network consensus could all be wrong in some intrinsic way).
unwind4 months ago
This was very nice, well-written and good-looking (to me) content. Thanks!
There was some meta-bug with encoding which almost is like an internal joke but I don't think it's intentional. The Haskell code examples have:
```
    inc :: Int -&gt; Int
```
several times, and that `>` entity should very likely just be a `<` in the source code.
- AndrewDucker4 months ago
  Nit: ">"
dinkelberg4 months ago
I would recommend reading the post that Hillel is responding to instead, namely Luke Plant's "Breaking 'provably correct' Leftpad" (https://lukeplant.me.uk/blog/posts/breaking-provably-correct...).
skybrian4 months ago
Many interesting statements aren't a property of the code alone. They're a property of the code when it's run in a particular environment. If you want the proof to be portable then it should only make assumptions that are true in any environment.
- dapperdrake4 months ago
  No assumption holds for all environments.
  Posh example: Axiom of choice.
  - naasking4 months ago
    There are definitely some assumptions that hold for all environments. For instance, "all resources are finite".
    eszed4 months ago
    Aye, but in many cases a theoretical limit can be ignored in practice. (Stupid example: I need more cooling = "I need more fans", not "I'm running out of air".)
    naasking4 months ago
    Finite resources are a common problem though.
    lstodd4 months ago
    Stupidity is known to be infinite. And it is a resource.
    naasking4 months ago
    I dispute the infinity of stupidity! It can only be infinite on an infinite timeline, as there are a finite number of people at any given time! But is time infinite?
mrkeen4 months ago
"The assumptions are wrong" is an unsatisfying "catch-all": compiler, hardware, environment, concurrency bugs.
Here's one that's been bugging me lately: Dekker's algorithm (mutual exclusion / critical section) won't work on modern hardware. CPUs are free to rearrange reads and writes such that they still make sense under single-threaded conditions.
It's kind of put me off learning about lockfree/lockless programming. Even if I make a machine-checked proof that my concurrent code races only in acceptable ways, I have no way of knowing which instructions will be re-ordered, invalidating the proof.
- layer84 months ago
  Dekker's algorithm will work on modern hardware if you make use of the appropriate memory barriers. Memory barriers are essential for lock-free algorithms, so I’m not sure why that is putting you off learning them. They are, in a way, an embodiment of the assumptions underlying the algorithms, including Decker’s.
Joker_vD4 months ago
> I wrote that the "binary search" bug happened because they proved the wrong property, but you can just as well argue that it was a broken assumption (that integers could not overflow).
No, you can't argue that (well, argue that successfully). If you are trying to prove some properties of a program X written in a certain programming language Y, you can't "just assume" that the semantics of Y are different from what they are! The integers do overflow in Java, that's explicitly stated in the Java's spec.
ip264 months ago
Is asserting the assumptions during code execution not standard practice for formally verified code?
- kragen4 months ago
  By "asserting X" do you mean "checking whether X is true and crashing the program if not", like the assert macro in C or the assert statement in Python? No, that is almost never done, for three reasons:
  • Crashing the program is often what you formally verified the program to prevent in the first place! A crashing program is what destroyed Ariane 5 on its maiden flight, for example. Crashing the program is often the worst possible outcome rather than an acceptable one.
  • Many of the assumptions are not things that a program can check are true. Examples from the post include "nothing is concurrently modifying [variables]", "the compiler worked correctly, the hardware isn't faulty, and the OS doesn't mess with things," and, "unsafe [Rust] code does not have [a memory bug] either." None of these assumptions could be reliably verified by any conceivable test a program could make.
  • Even when the program could check an assumption, it often isn't computationally feasible; for example, binary search of an array is only valid if the array is sorted, but checking that every time the binary search routine is invoked would take it from logarithmic time to linear time, typically an orders-of-magnitude slowdown that would defeat the purpose of using a binary search instead of a simpler sequential search. (I think Hillel tried to use this example in the article but accidentally wrote "binary sort" instead, which isn't a thing.)
  When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too. In those cases, it's common to use either runtime checks or property-based testing instead of formal verification, which is harder.
  - ip264 months ago
    This becomes an interesting conversation then. First of all, it could mean "checking whether X is true and logging an error" instead of exiting the program.
    - But if you aren't comfortable crashing the program if the assumptions are violated, then what is your formal verification worth? Not much, because the formal verification only holds if the assumptions hold, and you are indicating you don't believe they will hold.
    - True, some are infeasible to check. In that case, you could then check them weakly or indirectly. For example, check if the first two indices of the input array are not sorted. You could also check them infrequently. Better to partially check your assumptions than not check at all.
    empath754 months ago
    > "checking whether X is true and logging an error"
    You now have some process or computation which is in an unknown state. Presumably it was important! There are lots of cases where runtime errors are fine and expected -- when processing input from the outside world, but if you started a chain of computation with good input and somehow ended up with bad input, then you have a bug! That is bad! Everything after that point can no longer be trusted, and there was some place where the code went wrong before that and made everything between there and where you caught the error invalid. And that has possibly poisoned your whole system.
    Jaxan4 months ago
    You don’t assume the assertions, the verification shows they always hold!
    kragen4 months ago
    You assume the premises. The verification shows that the conclusions hold (assuming the premises do). Both premises and conclusions are, in some sense, "assertions", though not the C assert sense.
    kragen4 months ago
    You didn't answer my question!
    ip264 months ago
    I mean to ask both: "checking whether X is true and crashing the program if not", like the assert macro, OR assert as in a weaker check that does not crash the program (such as generate a log event).
    When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too.
    What's interesting to me is the combination of two claims: formal verification is used when crashes are not acceptable, and crashing when formal assumptions are violated is therefore not acceptable. This makes sense on the surface - but the program is only proven crashproof when the formal assumptions hold. That is all formal verification proves.
    ggm4 months ago
    I believe this is a false dichotomy. It's interesting but it also demands no third way exist.
    For example I have run services which existed inside a forever() loop. They exit, and are forceably restarted. Is that viable for a flight control system? No. Does it allow me to meet a low bounds availability problem with OOM killers? Yes, until the size of a quiescent from-boot system causes OOM.
    BGP speakers who compile in runtime caps on the prefix count can be entirely stable and run BGP "correctly" right up to the point somebody sends them more than the cap in prefixes. That can take hours to emerge. I lived in that world for a while too.
    kragen4 months ago
    It's more that adding intentional conditional crashes to the program in situations where crashing is the worst possible outcome can't possibly make the situation better. It might not make it worse, if the crashes never happen.
    As for log messages, yeah, people do commonly put log messages in their software for when things like internal consistency checks fail.
- cowsandmilk4 months ago
  That’s impractical. Take binary search and the assumption the list is sorted. Verifying the list is sorted would negate the point of binary search as you would be inspecting every item in the list.
  - AnimalMuppet4 months ago
    Only if you verify it for every search. If you haven't touched the list since the last search, the verification is still good. For some (not all) situations, you can verify the list at the start of the program, and never have to verify it again.
  - patrakov4 months ago
    There is a useful middle ground here. When picking the middle element, verify that it is indeed within the established bounds. This way, you'll still catch the sort order violations that matter without making the whole search linear.
  - voxl4 months ago
    ASSERTING the list is sorted as an assumption is significantly different form VERIFYING that the list is sorted before executing the search. Moreover, type systems can track that a list was previously sorted and maintained it's sorted status making the assumption reasonable to state.
    jojomodding4 months ago
    What do you mean when you say "assert" and "verify"? In my head, given the context of this thread and the comment you're replying to, they can both only mean "add an `if not sorted then abort()`."
    But you make some sort of distinction here.
    bluGill4 months ago
    Verify means you check. Assert means you say it is, but might or might not check.
    cyphar4 months ago
    I know that Solaris (or at least, ZFS) has VERIFY and ASSERT macros where the ASSERT macros are compiled out in production builds. Is that the kind of thing you're referring to?
    You can aslo mark certain codepaths as unreachable to hint to the compiler that it can make certain optimisations (e.g., "this argument is never negative"), but if you aren't validating that the assumption is correct I wouldn't call that an assertion -- though a plain reading of your comment would imply you would still call this an "assertion"? AFAIK, no language calls this construct "assert".
    This is probably one of those "depends on where you first learned it" bits of nomenclature, but to me the distinction here is between debug assertions (compiled out in production code) and assertions (always run).
    nothrabannosir4 months ago
    This thread started with:
    > Is asserting the assumptions during code execution not standard practice for formally verified code?
    Are you using the same definition of "assert" as that post does?
    bluGill4 months ago
    I'm not clear what definition of assert anyone is using. Thus I'm trying to create a new one that I think is useful (in the context of this type of discussion only!).
    Verify means you checked.
    Assert means you are suggesting something is true, but might or might not have checked. Sometimes an assert is "too hard" to verify but you have reason to think it is true. This could be because of low level code, or just that it is possible to verify but would cost too much CPU (runtime, or possibly limits of our ability to prove large systems) Sometimes assert is like a Mafia boss (It is true or I'll shoot - it might or might not really be true but nobody is going to argue the point now. This can sometimes be needed to keep a discussion on a more important topic despite the image)
    empath754 months ago
    assert in most languages is a boolean that crashes the program if it is false.
    If you want to assert that a list is sorted, you need some function that checks if it is sorted and returns a boolean.
    bluGill4 months ago
    In many (most?) languages assert is an optional crash if false. The language can choose to run the check or not. A function to check if a list is sorted and return a boolean is not hard to write - but of course you then need to prove that function is correct.
    kg4 months ago
    > Moreover, type systems can track that a list was previously sorted and maintained it's sorted status making the assumption reasonable to state.
    This is true, but if you care about correct execution, you would need to re-verify that the list is sorted - bitflips in your DRAM or a buggy piece of code trampling random memory could have de-sorted the list. Then your formally verified application misbehaves even though nothing is wrong with it.
    It's also possible to end up with a "sorted" list that isn't actually sorted if your comparison function is buggy, though hopefully you formally verified the comparison function and it's correct.
    voxl4 months ago
    You already have hardware level bit flip verification, you don't need to recheck the list
    tsimionescu4 months ago
    That only works up to some level of bit flips, like all error correcting codes. It works for our maybe even two bit flips, but not more than that.
    comex4 months ago
    Only if you have ECC RAM, and even then it’s not perfect.
- ngruhn4 months ago
  How would that look like if you accidentally assumed you have arbitrary large integers but in practice you have 64 bits?
  - appellations4 months ago
    Add(x,y): Assert( x >= 0 && y>= 0 ) z = x + y Assert( z >= x && z >= y ) return z
    There’s definitely smarter ways to do this, but in practice there is always some way to encode the properties you care about in ways that your assertions will be violated. If you can’t observe a violation, it’s not a violation https://en.wikipedia.org/wiki/Identity_of_indiscernibles
    4 months ago
    undefined
    bluGill4 months ago
    In some languages overflow is asserted as a can't happen and so the optimizer will remove your checks
    appellations4 months ago
    Best I can tell is that overflow is undefined behavior for signed ints in C/C++ so -O3 with gcc might remove a check that could only be true if UB occurred.
    The compound predicate in my example above coupled with the fact that the compiler doesn’t reason about the precondition in the prior assert (y is non-negative) means this specific example wouldn’t be optimized away, but bluGill does have a point.
    An example of an assert that might be optimized away:
    int addFive(int x) { int y = x + 5; assert(y >= x); return y; }
    uecker4 months ago
    Yes, you can not meaningfully assert anything after UB in C/C++. But you can let the compiler add the trap for overflow -fsanitize=signed-integer-overflow -sanitize-trap=all, or you could also write your assertion in a way where it does not rely on the result (e.g. ckd_add), or you use "volatile" to write in a way the compiler is not allowed to assume anything.
    comex4 months ago
    Clang is a bit smarter than GCC here (for some definition of 'smart') and does optimize the original version:
    https://gcc.godbolt.org/z/3Y4aheG6x
    appellations4 months ago
    Care to share a language where the compiler infers the semantic meaning of asserts and optimizes them away? I’ve never heard of this optimization.
    MindSpunk4 months ago
    Signed overflow is UB in C/C++ and several compilers will skip explicit overflow checks as a result. See: https://godbolt.org/z/WehcWj3G5
    mrkeen4 months ago
    C. This is a great thread: https://mastodon.social/@regehr/113821964763012870
    (That was one of my texts at uni)
    Maxatar4 months ago
    C and C++
GuB-424 months ago
Obligatory quote:
"Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth
Source: https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf
From a quick look at it, it seems to be a "case 1" potential error, as the proof is done by hand. That is the warning is likely because it is a mathematical exercise first, and the code is for illustration and not for inclusion in production software.
- hshdhdhehd4 months ago
  This is why one would prefer to run. NGINX over a formally verified http server.
failrate4 months ago
The article did not also point out issues like the compiler or interpreter having a runtime error, the hardware the software is running on having a defect, etc. Do we consider these out of scope of the discussion?
4 months ago
undefined
0xDEAFBEAD4 months ago
>properties like all x in Int: inc(x) > x.
Why not just use an assertion? It costs more at runtime, but it's far less dev time than developing a proof, right?
- kryptiskt4 months ago
  It's not a substitute, an assertion only makes sure that it doesn't happen unnoticed, a proof says that it doesn't happen at all. I might prefer to know that my reactor control system has no overflow ahead of time, rather than getting an assertion and crashing if it happens (though it would be a bad reactor control system if it couldn't handle crashes).
  - 0xDEAFBEAD4 months ago
    OK, but that's a reactor control system. If your application has no time-critical aspect, is there any good reason to prefer a proof over an assert?
    dbdr4 months ago
    Will your users be happier to get an assertion failure when they run your code, compared to no error?
    (the tradeoff of course being that they might get your code later because it took more time to prove the code is correct)
    0xDEAFBEAD4 months ago
    What if it's an internal tool, where correctness matters a lot, but failures matter little? E.g. roll back all database changes in case of assertion failure. Perhaps an accounting thing or something of that nature.
    addaon4 months ago
    If your application has no time-critical aspect, why are you working on it now? There’s plenty of time to get to it later.
    0xDEAFBEAD4 months ago
    Very funny!
    empath754 months ago
    Because a run-time error produces a failure at runtime or a crash. You would much prefer that errors are impossible.
    epolanski4 months ago
    Yes, a proof allows me to not have to handle the case at all.
- mrkeen4 months ago
  Nearly everything that goes wrong for me in prod is already a crash/stacktrace, which is what an assertion does.
  The point of proofs is to not get yourself (or the customer) into that point in the first place.
- djoldman4 months ago
  I think other responses here may miss an important line of thought.
  There seems to be confusion around the differences between bug and correct.
  In the referenced study on binary searches, the Google researchers stated that most implementations had bugs AND were "broken" because they did:
  int mid =(low + high) / 2;
  And low + high can overflow.
  Ostensibly, a proof may require mid to be positive at all times and therefore be invalid because of overflow.
  The question is: if we put in assertions everywhere that guarantee correct output, where aborting on overflow is considered correct, then are folks going to say that's correct, or are they going to say oh that's not what binary search is so you haven't implemented it?
  EDIT: thinking about it more, it seems that null must be an accepted output of any program due to the impossibility of representing very large numbers.
- hshdhdhehd4 months ago
  Correct. You wouldn't prove everything in a typical production system unless that system is controlling a rocket or a pacemaker.
  You may want to prove some things. Test others.
  Types are proofs by the way. Most programmers find that handy.
  I think assertions are rediculously underused BTW
- AdieuToLogic4 months ago
  Better to use a type which cannot represent an invalid state.
  For example, instead of defining `inc` with `Int`s, the equivalent of an `IncrementalInt` type parameter for the `inc` function will ensure correctness without requiring runtime assertions.
- rwmj4 months ago
  If you're flying a plane, you probably don't want safety critical code to throw an assertion. It's for this reason that Airbus spends a bunch of money & time on proving their flight control software.
jonathanstrange4 months ago
No hardware failure is considered? No cosmic rays flipping bits? No soft or hard real-time guarantees are discussed? What about indeterminate operations that can fail such as requesting memory from some operating system dynamically?
I'm asking because I thought high integrity systems are generally evaluated and certified as a combination of hardware and software. Considering software alone seems pretty useless.
- codebje4 months ago
  Specifications that are formally verified can definitely cover real-time guarantees, behaviour under error returns from operations like allocation, and similar things. Hardware failures can be accounted for in hardware verification, which is much like software verification: specification + hardware design = verified design; if the spec covers it, the verification guarantees it.
  Considering software alone isn't pretty useless, nor is having the guarantee that "inc x = x - 1" will always go from an Int to an Int, even if it's not "fully right" at least trying to increment a string or a complex number will be rejected at compile time. Giving up on any improvements in the correctness of code because it doesn't get you all the way to 100% correct is, IMO, defeatist.
  (Giving up on it because it has diminishing returns and isn't worth the effort is reasonable, of course!)
  - charcircuit4 months ago
    Hardware verification doesn't prevent hardware failures. There is a reason RAM comes with ECC. It's not because RAM designers are too lazy to do formal verification. Even with ECC RAM, bit flips can still happen if multiple bits flip at the same time.
    There are also things like CPUs taking the wrong branch that occasionally happen. You can't assume that the hardware will work perfectly in the real world and have to design for failure.
    jojomodding4 months ago
    Designing around hardware failure in software seems cumbersome to insane. If the CPU can randomly execute arbitrary code because it jumps to wherever, no guarantees apply.
    What you actually do here is consider the probability of a cosmic ray flip, and then accept a certain failure probability. For things like train signals, it's one failure in a billion hours.
    kragen4 months ago
    > Designing around hardware failure in software seems cumbersome to insane.
    Yet for some reason you chose to post this comment over TCP/IP! And I'm guessing you loaded the browser you typed it in from an SSD that uses ECC. And probably earlier today you retrieved some data from GFS, for example by making a Google search. All three of those are instances of software designed around hardware failure.
    lou13064 months ago
    But you must drive a line somewhere.
    If "a cosmic ray could mess with your program counter, so you must model your program as if every statement may be followed by a random GOTO" sounds like a realistic scenario software verification should address, you will never be able to verify anything ever.
    kragen4 months ago
    I agree, you definitely won't be able to verify your software under that assumption; you need some hardware to handle it, such as watchdog timers (when just crashing and restarting is acceptable) and duplex processors like some Cortex-R chips. Or TMR.
    spartanatreyu4 months ago
    An approach that has been taken for hardware in space is to have 3 identical systems running at the same time.
    Execution continues while all systems are in agreement.
    If a cosmic ray causes a bit-flip in one of the systems, the system not in agreement with the other two takes on the state of the other two and continues.
    If there is no agreement between all 3 systems, or the execution ends up in an invalid state, all systems restart.
    charcircuit4 months ago
    Yet, big sites like Google or TikTok constantly deal with hardware failures everyday while keeping their services and apps running.
    pixl974 months ago
    >Designing around hardware failure in software seems cumbersome to insane
    I mean there are places to do it. For example ZFS and filesystem checksums. If you've ever been bit by a hard drive that says everything is fine but returns garbage you'll appreciate it.
    codebje4 months ago
    Well of course hardware fails, and of course verification doesn't make things work perfectly. Verification says the given design meets the specification, assumptions and all. When the assumptions don't hold, the design shouldn't be expected to work correctly, either. When the assumptions do hold, formal verification says the design will work correctly (plus or minus errors in tools and materials).
    We know dynamic RAM is susceptible to bit-flip errors. We can quantify the likelihood of it pretty well under various conditions. We can design a specification to detect and correct single bit errors. We can design hardware to meet that specification. We can formally verify it. That's how we get ECC RAM.
    CPUs are almost never formally verified, at least not in full. Reliability engineering around systems too complex to verify, too expensive to engineer to never fail, or that might operate outside of the safe assumptions of their verified specifications, usually means something like redundancy and majority-rules designs. That doesn't mean verification plays no part. How do you know your majority-rules design works in the face of hardware errors? Specify it, verify it.
- themafia4 months ago
  Even then you have other physical issues to consider. This is one of the things I love about the Space Shuttle. It had 5 computers for redundancy during launch and return. You obviously don't want to put them all in the same place so you spread them out among the avionics bays. You also obviously don't want them all installed in the same orientation so you install them differently with respect to the vehicles orientation. You also have a huge data network that requires redundancy and you take all the same steps with the multiplexers as well.
  - cpgxiii4 months ago
    The best example on the Shuttle were the engine control computers. Each engine had two controllers, primary and backup, each with its own set of sensors in the engine itself and each consisting of a lock-step pair of processors. For each engine, the primary controller would use processors built by one supplier, while the backup would use processors of the same architecture but produced by an entirely different supplier (Motorola and TRW).
    Today, even fairly standard automotive ECUs use dual-processor lock-step systems; a lot the the Cortex-R microcontrollers on the market are designed around enabling dual-core lock-step use, with error/difference checking on all of the busses and memory.
    me_again4 months ago
    But what do you do when the primary and backup disagree?
    cpeterso4 months ago
    This question is posed in Segal’s Law:
    A man with a watch knows what time it is. A man with two watches is never sure.
    https://en.wikipedia.org/wiki/Segal%27s_law
    naasking4 months ago
    Requiring specialized hardware seems overly strict now that we can handle such things at a higher level via something like the fault tolerant lambda calculus.
- ip264 months ago
  Nowhere does the article claim that:
  "formal verification of the code" -> "high integrity system"
  Formal verification is simply a method of ensuring your code behaves how you intend.
  Now, if you want to formally verify your program can tolerate any number of bits flip on any variables at any moment(s) in time, it will happily test this for you. Unfortunately, assuming presently known software methods, this is an unmeetable specification :)
- westurner4 months ago
  Side channels? Is best out of 2 sufficient or is best out of 3 necessary?
  From https://news.ycombinator.com/context?id=39938759 re: s2n-tls:
  > [ FizzBee, Nagini, Deal-solver, Dafny; icontract, pycontracts, Hoare logic, DbC Design-by-Contract, invariants, parallelism and concurrency and locks, io latency, pass by reference in distributed systems, "FaCT: A DSL for Timing-Sensitive Computation" and side channels [in hw and software] https://news.ycombinator.com/item?id=38527663 ]
  There are so many things to consider;
  /? awesome-safety https://westurner.github.io/hnlog/#search:awesome-safety :
  awesome-safety-critical: https://awesome-safety-critical.readthedocs.io/en/latest/
  Hazard (logic) https://en.wikipedia.org/wiki/Hazard_(logic)
  Hazard (computer architecture); out-of-order execution and delays: https://en.wikipedia.org/wiki/Hazard_(computer_architecture)
  Soft error: https://en.wikipedia.org/wiki/Soft_error
  SEU: Single-Event Upset: https://en.wikipedia.org/wiki/Single-event_upset
  And then cosmic ray and particle physics
- skybrian4 months ago
  For portable libraries and apps, there's only so much you can do. However, there are some interesting properties you can prove assuming the environment behaves according to a spec.
- kragen4 months ago
  The article does consider hardware failure, yes.
4 months ago
undefined
huflungdung4 months ago
[dead]
ocelotBridge4 months ago
Formal verification is great, but cosmic rays don't read our specs.
- addaon4 months ago
  They do if they’re in the specs. See near-universal use of lockstep processors, ECC etc in safety-critical and high radiation cases, which comes from the two requirements that “a single bit flip from a cosmic ray shall be detectable” and “multiple simultaneous hit flips from cosmic rays shall be shown to be sufficiently rare in the operating environment.”
stephenlf4 months ago
This is incredible. This post led me to your GitHub, which is full of similarly incredible content. “Awesome Cold Showers”? Beautiful.