verification - Are we building the software right?
validation - Are we building the right software?
makes many a thing easier to talk about at work
Verification aligns with a specification. E.g. verify if what was implemented matches what was specified.
Validation aligns with a requirement. E.g. verify if what was implemented solves the actual problem.
What is your product goal? Both for the customer (solving a problem) and for the company(solve a marketing need, make money)?
Can you proof that you are meeting both? Then you are validating your goals successfully.
Most of the time products are missing defined goals, so lots of people have no idea what the fuck they are developing and everyone ends up with their own idea of the product.
EDIT: more formally, specification is a document stating requirements
Verification: formal theoretical proof
Validation: Empirical test based approach
Verification first and foremost concerns design and development stage, a formal definition would be something like "outputs of design and development step meet specified requirements for that step". For example, you could verify that a formal specification for a module actually implements imposed requirements. Especially in software this can get murky as you cover different phases with different tests/suites, e.g. unit, integration, e2e implicitly test different stages, yet are often part of the same testing run.
Validation first and foremost concerns the whole system/product and fitness for market availability.
For example you would verify that e.g. for a motor vehicle ABS functions correctly, airbags deploy in a crash and frame meets passenger safety requirements in a crash, and you would still not be able to sell such vehicle. You would validate the vehicle as a whole with corresponding regulatory body to get vehicle deemed fit for market.
TLDR: Verification is getting passing grade in a lab test, validation is submitting all relevant lab test protocols to get CE certified.
Like "Customer validation" and "Verification against the spec"
Like "sympathy" and "empathy". I finally heard a mnemonic but if I need a mnemonic maybe it's a sign that the words are just too hard to understand
- Did we correctly understand and build what the customer asked for
- Did the customer actually ask for what would solve the problem they are trying to solve
The second one is very important to ask early on, and _not_ asking it can cause a lot of wasted time. When I'm in a position to comment on whether I think someone fits the requirements of a senior developer, whether or not they ask that question plays directly into it.
People have commented that validation is different from verification, but I think validation can be done by formally specifying the environment the program runs in.
For example, the question whether user can do X in your software corresponds to a question, is there a sequence of user inputs such that the software plus operating system leads to the output X.
By including the environment (like the operating system) into the formal spec we could answer (in theory) such questions.
Even if the question is fuzzy, we can today formalize it with AI models. For example, if we ask, does paint program allow user to draw a rose? We can take a definition of a rose from a neural network that recognizes what a rose is, and formally verify the system of paint program and rose verifier given by the NN.
Can we? That just shifts the problem to building such a NN.
I think it would already be pretty tricky to get a decent sized panel of humans to agree on what a "rose" is. Do rose-shaped but yellow flowers count? How about those genetically altered thornless ones? How about perfectly imitatations that are made of plastic? Images of roses? The flowers from the list at https://www.epicgardening.com/rose-lookalikes/?
And of course there's the problem of getting a NN to recognise just roses. The mythical neural network with 0% error rate is exceptionally rare and usually not very useful for real world applications. I very much doubt humanity could build a neural network that perfectly recognises roses and only roses.
This is the main issue I have with formal methods in commercial settings. Most of the time, the issue is that the model reality map is far from accurate. If you have to rewrite your verification logic every time you have to update your tests, development would go very slowly
You cannot get rid of the enviroment, and the best thing you can do is be explicit in what you rely on. For example, formally verified operating systems often rely on hardware correctness, even though the errata sheets for modern CPUs are hundreds of pages long.
In the above example, that NN recognizes rose as a rose is an assumption that it is correct as a model of the (part of the) world. On that assumption, we get a formal definition of "an image of a rose in the pixelated image", and we use that to formally prove our system allows roses to be painted.
But getting rid of that assumption is I believe epistemologically impossible; you have to assume some other correspondence with reality.
I agree with you on the hardware. I think the biggest obstacle to software formal correctness is the lack of formal models of which we can be confident describe our environment. (One obstacle is the IP laws - companies do not like to share the models of things they produce.)
You can't just chuck a neural network into the mix because they aren't formally verified to do basically anything beyond matrix multiplication and the fact that they are universal approximators.
There was some meta-bug with encoding which almost is like an internal joke but I don't think it's intentional. The Haskell code examples have:
inc :: Int -> Int
several times, and that `>` entity should very likely just be a `<` in the source code.No, you can't argue that (well, argue that successfully). If you are trying to prove some properties of a program X written in a certain programming language Y, you can't "just assume" that the semantics of Y are different from what they are! The integers do overflow in Java, that's explicitly stated in the Java's spec.
Posh example: Axiom of choice.
"Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth
Source: https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf
From a quick look at it, it seems to be a "case 1" potential error, as the proof is done by hand. That is the warning is likely because it is a mathematical exercise first, and the code is for illustration and not for inclusion in production software.
Why not just use an assertion? It costs more at runtime, but it's far less dev time than developing a proof, right?
(the tradeoff of course being that they might get your code later because it took more time to prove the code is correct)
The point of proofs is to not get yourself (or the customer) into that point in the first place.
There seems to be confusion around the differences between bug and correct.
In the referenced study on binary searches, the Google researchers stated that most implementations had bugs AND were "broken" because they did:
int mid =(low + high) / 2;
And low + high can overflow.Ostensibly, a proof may require mid to be positive at all times and therefore be invalid because of overflow.
The question is: if we put in assertions everywhere that guarantee correct output, where aborting on overflow is considered correct, then are folks going to say that's correct, or are they going to say oh that's not what binary search is so you haven't implemented it?
EDIT: thinking about it more, it seems that null must be an accepted output of any program due to the impossibility of representing very large numbers.
You may want to prove some things. Test others.
Types are proofs by the way. Most programmers find that handy.
I think assertions are rediculously underused BTW
For example, instead of defining `inc` with `Int`s, the equivalent of an `IncrementalInt` type parameter for the `inc` function will ensure correctness without requiring runtime assertions.
• Crashing the program is often what you formally verified the program to prevent in the first place! A crashing program is what destroyed Ariane 5 on its maiden flight, for example. Crashing the program is often the worst possible outcome rather than an acceptable one.
• Many of the assumptions are not things that a program can check are true. Examples from the post include "nothing is concurrently modifying [variables]", "the compiler worked correctly, the hardware isn't faulty, and the OS doesn't mess with things," and, "unsafe [Rust] code does not have [a memory bug] either." None of these assumptions could be reliably verified by any conceivable test a program could make.
• Even when the program could check an assumption, it often isn't computationally feasible; for example, binary search of an array is only valid if the array is sorted, but checking that every time the binary search routine is invoked would take it from logarithmic time to linear time, typically an orders-of-magnitude slowdown that would defeat the purpose of using a binary search instead of a simpler sequential search. (I think Hillel tried to use this example in the article but accidentally wrote "binary sort" instead, which isn't a thing.)
When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too. In those cases, it's common to use either runtime checks or property-based testing instead of formal verification, which is harder.
- But if you aren't comfortable crashing the program if the assumptions are violated, then what is your formal verification worth? Not much, because the formal verification only holds if the assumptions hold, and you are indicating you don't believe they will hold.
- True, some are infeasible to check. In that case, you could then check them weakly or indirectly. For example, check if the first two indices of the input array are not sorted. You could also check them infrequently. Better to partially check your assumptions than not check at all.
You now have some process or computation which is in an unknown state. Presumably it was important! There are lots of cases where runtime errors are fine and expected -- when processing input from the outside world, but if you started a chain of computation with good input and somehow ended up with bad input, then you have a bug! That is bad! Everything after that point can no longer be trusted, and there was some place where the code went wrong before that and made everything between there and where you caught the error invalid. And that has possibly poisoned your whole system.
When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too.
What's interesting to me is the combination of two claims: formal verification is used when crashes are not acceptable, and crashing when formal assumptions are violated is therefore not acceptable. This makes sense on the surface - but the program is only proven crashproof when the formal assumptions hold. That is all formal verification proves.
For example I have run services which existed inside a forever() loop. They exit, and are forceably restarted. Is that viable for a flight control system? No. Does it allow me to meet a low bounds availability problem with OOM killers? Yes, until the size of a quiescent from-boot system causes OOM.
BGP speakers who compile in runtime caps on the prefix count can be entirely stable and run BGP "correctly" right up to the point somebody sends them more than the cap in prefixes. That can take hours to emerge. I lived in that world for a while too.
But you make some sort of distinction here.
You can aslo mark certain codepaths as unreachable to hint to the compiler that it can make certain optimisations (e.g., "this argument is never negative"), but if you aren't validating that the assumption is correct I wouldn't call that an assertion -- though a plain reading of your comment would imply you would still call this an "assertion"? AFAIK, no language calls this construct "assert".
This is probably one of those "depends on where you first learned it" bits of nomenclature, but to me the distinction here is between debug assertions (compiled out in production code) and assertions (always run).
> Is asserting the assumptions during code execution not standard practice for formally verified code?
Are you using the same definition of "assert" as that post does?
Verify means you checked.
Assert means you are suggesting something is true, but might or might not have checked. Sometimes an assert is "too hard" to verify but you have reason to think it is true. This could be because of low level code, or just that it is possible to verify but would cost too much CPU (runtime, or possibly limits of our ability to prove large systems) Sometimes assert is like a Mafia boss (It is true or I'll shoot - it might or might not really be true but nobody is going to argue the point now. This can sometimes be needed to keep a discussion on a more important topic despite the image)
If you want to assert that a list is sorted, you need some function that checks if it is sorted and returns a boolean.
This is true, but if you care about correct execution, you would need to re-verify that the list is sorted - bitflips in your DRAM or a buggy piece of code trampling random memory could have de-sorted the list. Then your formally verified application misbehaves even though nothing is wrong with it.
It's also possible to end up with a "sorted" list that isn't actually sorted if your comparison function is buggy, though hopefully you formally verified the comparison function and it's correct.
Add(x,y):
Assert( x >= 0 && y>= 0 )
z = x + y
Assert( z >= x && z >= y )
return z
There’s definitely smarter ways to do this, but in practice there is always some way to encode the properties you care about in ways that your assertions will be violated. If you can’t observe a violation, it’s not a violation https://en.wikipedia.org/wiki/Identity_of_indiscerniblesThe compound predicate in my example above coupled with the fact that the compiler doesn’t reason about the precondition in the prior assert (y is non-negative) means this specific example wouldn’t be optimized away, but bluGill does have a point.
An example of an assert that might be optimized away:
int addFive(int x) {
int y = x + 5;
assert(y >= x);
return y;
}
(That was one of my texts at uni)
I'm asking because I thought high integrity systems are generally evaluated and certified as a combination of hardware and software. Considering software alone seems pretty useless.
Considering software alone isn't pretty useless, nor is having the guarantee that "inc x = x - 1" will always go from an Int to an Int, even if it's not "fully right" at least trying to increment a string or a complex number will be rejected at compile time. Giving up on any improvements in the correctness of code because it doesn't get you all the way to 100% correct is, IMO, defeatist.
(Giving up on it because it has diminishing returns and isn't worth the effort is reasonable, of course!)
There are also things like CPUs taking the wrong branch that occasionally happen. You can't assume that the hardware will work perfectly in the real world and have to design for failure.
We know dynamic RAM is susceptible to bit-flip errors. We can quantify the likelihood of it pretty well under various conditions. We can design a specification to detect and correct single bit errors. We can design hardware to meet that specification. We can formally verify it. That's how we get ECC RAM.
CPUs are almost never formally verified, at least not in full. Reliability engineering around systems too complex to verify, too expensive to engineer to never fail, or that might operate outside of the safe assumptions of their verified specifications, usually means something like redundancy and majority-rules designs. That doesn't mean verification plays no part. How do you know your majority-rules design works in the face of hardware errors? Specify it, verify it.
What you actually do here is consider the probability of a cosmic ray flip, and then accept a certain failure probability. For things like train signals, it's one failure in a billion hours.
Yet for some reason you chose to post this comment over TCP/IP! And I'm guessing you loaded the browser you typed it in from an SSD that uses ECC. And probably earlier today you retrieved some data from GFS, for example by making a Google search. All three of those are instances of software designed around hardware failure.
If "a cosmic ray could mess with your program counter, so you must model your program as if every statement may be followed by a random GOTO" sounds like a realistic scenario software verification should address, you will never be able to verify anything ever.
Execution continues while all systems are in agreement.
If a cosmic ray causes a bit-flip in one of the systems, the system not in agreement with the other two takes on the state of the other two and continues.
If there is no agreement between all 3 systems, or the execution ends up in an invalid state, all systems restart.
I mean there are places to do it. For example ZFS and filesystem checksums. If you've ever been bit by a hard drive that says everything is fine but returns garbage you'll appreciate it.
"formal verification of the code" -> "high integrity system"
Formal verification is simply a method of ensuring your code behaves how you intend.Now, if you want to formally verify your program can tolerate any number of bits flip on any variables at any moment(s) in time, it will happily test this for you. Unfortunately, assuming presently known software methods, this is an unmeetable specification :)
Today, even fairly standard automotive ECUs use dual-processor lock-step systems; a lot the the Cortex-R microcontrollers on the market are designed around enabling dual-core lock-step use, with error/difference checking on all of the busses and memory.
From https://news.ycombinator.com/context?id=39938759 re: s2n-tls:
> [ FizzBee, Nagini, Deal-solver, Dafny; icontract, pycontracts, Hoare logic, DbC Design-by-Contract, invariants, parallelism and concurrency and locks, io latency, pass by reference in distributed systems, "FaCT: A DSL for Timing-Sensitive Computation" and side channels [in hw and software] https://news.ycombinator.com/item?id=38527663 ]
There are so many things to consider;
/? awesome-safety https://westurner.github.io/hnlog/#search:awesome-safety :
awesome-safety-critical: https://awesome-safety-critical.readthedocs.io/en/latest/
Hazard (logic) https://en.wikipedia.org/wiki/Hazard_(logic)
Hazard (computer architecture); out-of-order execution and delays: https://en.wikipedia.org/wiki/Hazard_(computer_architecture)
Soft error: https://en.wikipedia.org/wiki/Soft_error
SEU: Single-Event Upset: https://en.wikipedia.org/wiki/Single-event_upset
And then cosmic ray and particle physics