To use formal verification you need to have formal requirements of the behavior of your software. Most software projects and design philosophies are simply incompatible with this.
In software development and design can often fall together, but that means that it is uniquely ill suited for formal methods. If you are developing, before you are sure what you even want, then formal methods do not apply.
But I agree that there are certain areas, mostly smaller, safety critical, systems, which rely on upfront specifications, which can heavily benefit from formal verification. E.g. Aerospace software relies heavily on verification.
It’s not a skill that requires a PhD and years of research to acquire these days.
Nor does writing a basic, high level specification.
If anything you will learn something about the system you are modelling by using a model checker. And that can be useful even if it is used for documentation or teaching.
The fundamental power of formal methods is that they force you to think things through.
All too often I find software developers eager to believe that they can implement concurrent algorithms armed with their own wit, a type checker, and a smattering of unit tests. It can be humbling to find errors in your design and assumptions after using a model checker. And perhaps it’s hubris that keeps programmers from using such tools in more “mundane” and “real world” contexts.
There are a lot more “small” distributed systems than you’d expect and state spaces are generally larger than you’d anticipate if you weren’t open to formalizing your work.
My development style is:
- prototype ideas in a quick and dirty dynamic typed language just to gain confidence it's going to work (a lot of ideas die here)
- rewrite in a more performant and safe language with a "type checker and a smattering of unit tests" (usually here I'm "done" and have moved onto the next idea/task. If there's an issue I fix it and add another test case)
I'm trying to imagine where formal verification comes in. I'm imagining something like:
- prototype ideas in a quick and dirty dynamic typed language just to gain confidence it's going to work (a lot of ideas die here)
- Formally model the requirements
- rewrite in a language that can be formally verified and which is hopefully performant and lets me do things like simd and/or cuda if needed
- Never have to fix a bug unless there was a bug in the requirements (?)
To me, it just seems like it would take an order of magnitude longer to develop things this way for not much benefit (I've traded development time and potentially runtime performance in exchange for correctness)
Long way to go. But there is a design.
Use z3/SMT instead of TLA+
I didn't talk about "costs" or about "how hard" it is, but that common practices in software development make using formal methods infeasible. If you want to use formal verification you likely have to change how you develop software. In an environment where there is general uncertainty about architecture and behavior, as is common in agile environments, formal verification is difficult to implement. By its nature, formal verification discourages fast iteration.
Not necessarily. You can develop the specification and implementation in parallel. That way, at every point in time, you could have a much more thorough and deep understanding of what you're currently building, as well as how exactly you decide to change it.
TLA+:
- https://en.wikipedia.org/wiki/TLA%2B
- https://lamport.azurewebsites.net/tla/tla.html
Alloy:
- https://en.wikipedia.org/wiki/Alloy_(specification_language)
That's the problem. Basic, high level specification gains nothing.
> I’ve taught developers how to use tools like TLA+ and Alloy in week.
You have week length courses. From my experience you have to spend at least of 100 hours trying to model real live distributed protocols with full liveness (most hard TLA+ part) support to be at somewhat formal verification beginner level. Toy examples from courses are cute but teach nothing.
This is a common complaint among beginners. The problem is that the syntax is very helpful later and, I would claim, is also helpful for beginners once something very important clicks.
TLA+ is a language for writing mathematical/logical formulas, and uses the standard mathematical/logical syntax that's developed over the past 150 years. There is no funamental reason why the standard symbol for conjunction (logical and) should be ∧, but that syntax developed so that ∧ is visually similar to ∩ (set intersection) because of the deep relationship between the two (a ∩ b = {x : x ∈ a ∧ x ∈ b}) which means that many transformations apply to both symbols in a similar way (same goes for the disjunction -- logical or -- symbol, ∨, and set union ∪, as a ∪ b = { x : x ∈ a ∨ x ∈ b}). As you learn to apply mathematical transformations on formulas, that syntax becomes helpful (not to mention that it's the same syntax you'd find in mathematics/logic texts where you can learn those transformations). On the other hand, the corresponding symbol used in programming, typically &, was chosen due to the technical limitations of computer input devices in the 1950s rather than as a means to aid the work of the programmer in some deeper way.
Now, the reason I would claim that the syntax is also helpful for beginners (although I acknowledge it is a "familiarity" stumbling block) is that it reminds them that the meaning of formulas in TLA+ is the same as their simple meaning in mathematics rather than their very complex meaning in programming. This is the important thing that needs to click into place.
For example, in mathematics, a function on the integers defined like so, f(x) = -f(x), has the meaning of being the zero function -- the function that is zero for all x. This is simple and obvious; to see that, add f(x) to both sides and divide by 2. Specifying a function in that way in TLA+ would specify the zero function. In few if any programming languages, however, would a function defined in this way specify the zero function. Usually it specifies something much more complicated, and that's because programming is much more complicated. There are many other such examples.
On the first day of a programmer learning TLA+, functions and operators may appear similar to subroutines in programming, but the similarity is superficial, and soon the big differences become apparent. The meaning of these things in TLA+ is much simpler than in programming and also much more powerful as they're more amenable to applying transformations and substitutions (e.g. in TLA+ there is no difference between x' = 3 and 3 = x', while the software operation this TLA+ equation often describes -- that of assignment -- behaves in a much more complicated way in code).
The mathematical syntax helps remind us that we're writing mathematics, not code, and that the symbols have their (simpler and more powerful) mathematical meaning rather than their coding meaning.
The purpose of TLA+ is to reason about the behaviour of a dynamic system in a rigorous way. That requires applying transformations (such as adding things to both sides of an equation, substituting things etc.), that, in turn, requires that symbols have their mathematical meaning, and that is aided by the standard mathematical syntax (again, not just because that syntax was often chosen to evoke important similarities but also because that syntax is the one that's used in most texts about logic/mathematics).
For me that clicked when I was learning TLA+ a decade ago and I asked on the mailing list if TLA+ uses eager or lazy evaluation. Leslie Lamport replied that there is no "evaluation" at all in mathematics.
> I truly wish designers of popular languages would incorporate model verification within the language/compiler tools - this would likely need a constrained subset of the language with restricted syntax and special extensions.
This is not so simple because the things you want to express about programs often goes well beyond what can be used to produce something executable, and, as I mentioned before, is aided by using mathematical meaning. There are languages that do what you want, but either their specification power is relatively weak (e.g. Dafny), or they're much more complicated than both TLA+ and mainstream programming languages (e.g. Idris).
TLA+ allows you to use a simple and very powerful language for the specification part and a relatively simple mainstream language for coding without compromising either one. The difficulty, however, is internalising that writing specification in TLA+ is a different activity from coding, and trying not to extrapolate what you know from code to maths.
TLA+ is actually much smaller, simpler, and easier to learn than Python, it's just that you need to understand that you're starting from scratch. Someone who doesn't know programming would learn TLA+ faster than Python, it's just that if you already know programming, learning a new programming language may be easier, at least at first, than learning TLA+ -- something that is very much not programming (even though it's simpler than programming).
> the "formal" function
"Formal" merely means mechanical, i.e. something that can be operated on by a computer. All programming is formal.
You definitely don't have to learn to do that to use some formal verification tools, but that is what you commit to learning when you choose to learn TLA+ specifically. That is because the assumption behind the design of TLA+ is that using mathematics is ultimately the easiest and clearest way to describe certain important things you may wish to know about programs. TLA+ is a produce of the nineties, and the specification languages that preceded TLA+ in the eighties used more programming-like approaches. And TLA+'s novelty came largely from abandoning the programming approach. It is a later, rather than an earlier evolutionary step.
However, if you accept that you won't be able to describe such things, certainly not easily, then there are other languages and tools that don't require a mathematical approach -- some predate TLA+ and some are later, but follow the earlier approach of the 1980s.
Every process is a tradeoff and it anyways depends on your specific circumstances which choice is best for your team and project.
Is it absolutely important that your system is correct? ... begs the question, correct with respect to what? Generally: a specification.
There are lots of situations where you don't know what your system is supposed to do, where testing a few examples is sufficient, or it's not terribly important that you know that it does what you say it ought to. Generating a batch report or storing some customer responses in a database? Trust the framework, write a few tests if you need to, nobody is going to find a formal specification valuable here.
However, if you need to deploy configuration to a cluster and need to ensure there is at least two nodes with the a version of the configuration that matches the database in the load balancer group at all times during the migration? Need to make sure the migration always completes and never leaves the cluster in a bad state?
Even smaller in scale: need to make sure that references in your allocator don't contain addresses outside of your memory pool? Need to make sure that all locks are eventually released?
It's definitely much faster to iterate on a formal specification first. A model checker executes your model against the entire state space. If you're used to test-driven development or working in a statically typed language, this is useful feedback to get early on in the design process.
What the scope is that is appropriate for using tools like this is quite large and not as niche as some folks imply. I don't do aerospace engineering but I've used TLA+ to model deployment scripts and find bugs in OpenStack deployments, as well as to simply learn that the design of certain async libraries are sound.
Update: more examples.
Is that the state of the art for formal methods? How do you think of formally verifying systems with floating points calculations, with randomness, with databases and external environments?
I think the state of the art is quite broad as there are many tools.
Model checking is, in my estimation, the most useful for industry programmers today. It doesn't require as much training to use as automated theorem proving, it doesn't provide the guarantees that mathematical induction and proof do; but you get far more for your buck than from testing alone.
However model checkers are very flexible because the language they use, math, is also very flexible. TLA+ has been used to find errors in the Xbox 360 memory controller before it went to launch as well as in Amazon's S3 [0] -- errors so complex that it would have been highly improbable that a human would detect them and the result of finding those errors and solving them saved real businesses a lot of money.
It comes down to your ability to pick good levels of abstraction. You can start with a very high level specification that admits no details about how the file system works or the kernel syscalls involved and still get value out of it. If those details _do_ matter there's a strategy called refinement where you can create another specification that is more specialized and write an expression that implies if the refined specification holds then the original specification does as well.
Tools for randomness and numerical methods exist and depending on your needs you may want to look at other tools than just a model checker. TLA+, for example, isn't good at modelling "amounts" as it creates a rather large state space quickly; so users tend to create abstractions to represent these things with a finite number of states.
[0] https://lamport.azurewebsites.net/tla/formal-methods-amazon....
A four-part TLA+ in Practice and Theory by user "pron" - https://pron.github.io/posts/tlaplus_part1
Do you have any specific resources recommendations for learning Alloy and TLA+?
From there, if you’ve digested those books then get to writing some specifications. Join the mailing lists or find a group on discord or slack or something. Try modelling an elevator. Make sure that every person that calls the elevator eventually arrives at their requested floor. Pick a favourite async library and see if you can model its structures and functionality. The important thing is to exercise that muscle and make mistakes.
Just for completeness apparently Alloy is at version 6 now (http://alloytools.org/alloy6.html) and the book is based on version 4 (http://alloytools.org/book.html). There's another resource linked on the site: https://haslab.github.io/formal-software-design/
... Using Mathematical Reasoning and Tools.
This is the main reason for me to study Formal Methods. The shift in thinking is profound and it affects one's own programming even if one does not use any of the tools.
I work on a very “product-y” back end that isn’t fully specified, but I have formally specified parts of it.
For instance, I property-based-tested a particularly nasty state machine I wrote to ensure that, no matter what kind of crazy input I called an endpoint with, the underlying state machine never made any invalid transitions. None of the code around the state machine has a formal spec, but because the state machine does, I was able to specify it.
In the process, I found some very subtle bugs I’d have never caught via traditional unit testing.
State machines are quite common, in aerospace software, where the requirements even specify the states and transitions. If you can formally verify them, I believe, you absolutely should, as often there can be a lot of subtlety going on there, especially if you have a distributed system of state machines sending messages to one another.
But you're right that to benefit from any formal methods, you need to compare the program's behaviour to something else that isn't the program itself, and that other thing also needs to be written in a formal language. To be able to write that other thing, you do need to have a precise understanding of what the wanted behaviour should be so that you could express it in a formal language, but it doesn't have to cover the full behaviour of the software.
As an example of what that means, take automated (unit) tests. Automated tests are a formal specification, and running them is a formal verification method. Tests are a relatively weak specification, and executing them is a relatively weak verification, compared to what we normally mean by "formal methods", but there is no clear qualitative difference -- conceptual or even practical -- between tests and more powerful formal specification and verification. You can think of these practices as more powerful tests or as tests that work differently, but tests are formal specifications, and if they're applicable to a piece of software, it's likely that richer formal specification methods are, too (learning the cost/benfit of other methods is similar to learning the cost/benefit of tests -- you don't get it right at first, but you learn along the way).
This is a great way to describe it, and an important concept to grasp.
Another kind of formal verification that exists between "standard" unit tests and model checking/automated theorem proving is property-based testing a la quickcheck: You have a function you want to test (e.g., a sort), you describe the properties of inputs ("a list of integers") and state the properties you want to hold of the output ("every number should appear the same number of times in the output as in the input, and every number should be >= the number to its left") and ask the system to generate and test lots of random inputs for you. These properties can often be used more or less directly in model checkers, which makes the checking exhaustive (on some small, finite domain).
It reminds me of the question about whether programming is math. Depends on what you think you're doing. Are you doing things with state to get results, or building a system of facts that results in the answer?
"Formal" simply means mechanical so neither is more formal than the other. There is, however, a qualitative and rather easily delineated difference between deductive methods that operate by applying deductive inference rules for some language and methods that operate on the model of the language (the universe of values, more or less). Neither is fundamentally more rigorous than the other, though. Remember, a test (assuming no nondeterminism) is a complete proof of the proposition it states, it's just that it can state only weak propositions, i.e. typically without quantifiers (forall, exists), and certainly without nested quantifiers.
For propositions with quantifiers, deductive and model based approaches can differ in cost. Usually, though not always, deductive methods are more expensive, so they're used less. Model based methods are less expensive because they can more easily reduce their "confidence". Complexity theory tells us that knowing the answer to some question has some minimal cost regardless of how the answer is learned. Model-based methods allow us to not really know the answer but increase our confidence in what it is, so their cost is more flexible.
Seems like there's domain-specific terminology (almost wrote "language"!) I'm using accidentally or incorrectly and making a mess. I might have stepped in it with "model." Mainly I just think I see a difference between ways of verifying where the work is doing and the work is being. But it sounds like I don't know enough to have this conversation.
Thanks for humoring me, though.
To get a sense of why software correctness is often more art than science, consider that an abstract algorithm can be proven correct but real software cannot because a software-based system is ultimately a physical system, and it is not possible to prove that an addition command given to a physical computer will always result in correct addition, even though that is true with high probability. Since confidence can never be 100%, and since many practices -- from testing through formal methods -- have been generally effective at producing software we can trust with our lives, the game is always about balancing confidence with cost.
I, for one, believe that TLA+ (and similar tools) can be used in quite a few situations to gain a required level of confidence more cost effectively than other methods, but there is no one right answer to software correctness and trick is learning when to apply different methods.
Typical unit tests (what fans of property based testing would call "example based tests") are existentially quantified propositions, e.g. if we squint at the test suite for a 'sort' function, we'll see propositions like "there exists a list l, such that sort(l) is sorted"; "there exists a list l, such that sort(sort(l)) equals sort(l)", "there exists a list l, such that sort(l) is a permutation of l", etc.
To make those executable, their proofs have to be constructive(/intuitionistic); e.g. we can't rely on double-negatives like "if sort(l) were not sorted it would lead to the following contradiction...". In constructive logic, the only way to prove an existential proposition is to construct a particular example and prove that this example satisfies the proposition. In a unit test, the example is our test data, the proposition is our assertion, and the proof comes from executing the test body to verify that the assertion is 'true'.
Thinking about test suites in this way can clarify our thoughts, which can lead us quite easily towards property-based testing and even inductive theorem proving. For example, consider a test like:
def test_sort_ignores_initial_order() {
assert(
sort([3, 1, 2]) == sort([2, 1, 3])
);
}
This is fine, but we can do better. In particular, those literals have been given as-is, without any indication of their relationship (if any), so anyone reading this test (e.g. after breaking it) has to guess where they came from. Not good for a specification! In this case it's pretty clear, but I've seen many tests in the RealWorld™ which look like `assert(foo(x) == y)` with very complicated literals for `x` and `y`, where it appears the author just copy-pasted the function's output as `y`, and there's no clear relationship between the two (is the `7` in that dictionary the same as this `7` over here, or is that a coincidence? Oh, it was the sum of that `19` with the `-12` from there...).Let's make this clearer to readers by stating the relationship:
def test_sort_ignores_initial_order() {
let l = [3, 1, 2];
assert(
sort(l) == sort(reverse(l))
);
}
Now this looks more like an existentially-quantified proposition: for some list, there is some permutation of it that gets the same output from `sort`. We prove this proposition, using the list `[3, 1, 2]` and the permutation `reverse`.Such statements are quite weak, but it's quite easy to strengthen them into the universally-quantified propositions that we actually care about, by abstracting-out hard-coded details that don't matter:
def test_sort_ignores_initial_order(l: List[Int]) {
assert(
sort(l) == sort(reverse(l))
);
}
Such abstraction goes beyond simple data values too, e.g. with a little effort we can abstract over the choice of permutation: def test_sort_ignores_initial_order(l: List[Int], permutation: Word) {
assert(
sort(l) == sort(permute(l, seed=permutation))
);
}
Unfortunately we can't prove universally-quantified statements just by executing their function body, but there are many paths open to us, with different costs/benefits:- Recover the old existential test, by running it with the examples as arguments. Hypothesis makes this easy with its `@example` decorator.
- Search for counterexamples, to try and prove ourselves wrong. This is property-based testing, AKA property checking, and supported by many test frameworks (Hypothesis, QuickCheck, etc.); there are different approaches to the search problem, some integrate with fuzzers, etc.
- Run a symbolic execution engine to try and prove it correct. This would be dependent on the language, and may require translation to some external formalism; it may also require a lot of manual effort to guide the tooling; and there's no guarantee it would actually work.
- Use a theorem prover to try and prove it correct. Again, this depends on the language and may require translation. First-order logics can be heavily automated, but struggle with some common patterns (e.g. those which require induction). Higher order logics are more expressive, but the tooling is less automated.
There's obviously a spectrum of outcomes, which depends on the effort required. Whilst most wouldn't recoup the cost of an inductive proof; I think property-based testing is very often worth it, and even symbolic execution could be practical in some everyday business situations.
That isn't quite right, though I know it's become a meme of sorts. An example, such as an automated test, can serve as a proof of an existentially quantified proposition but isn't itself one. For example, 3 > 2 is a proof of `∃x∈Int : x > 2` (because P(C) ⊢ ∃xP(x)) but isn't itself an existentially quantified proposition.
We can see that because if unit tests could express existentially quantified propositions then they could also express universally quantified ones by negation, which is available to tests, because existential and universal quantifiers are equally powerful, and can each be expressed in terms of the other as `(∀x : P(X)) ≣ (¬∃x : ¬P(x))` (we don't even need to resort to classical axioms because all domains are typically finite). [1]
Of course, unit tests could in principle express existentially quantified formulas (and consequently also universally quantified ones) through exhaustive loops over the domain, though the proof -- i.e. the execution in this case -- is not tractable in practice.
> Unfortunately we can't prove universally-quantified statements just by executing their function body, but there are many paths open to us, with different costs/benefits
Right, but there's another fully automated method that in some situations can serve as a full proof, though not a deductive one: model checking (indeed, there's a tool that offers it for TLA+). Contrary to some beliefs, model checking can efficiently exhaustively check infinite domains without symbolic techniques in some cases (though certainly not all, as that would violate computational complexity theorems; also, some model checkers use symbolic execution). For example, TLC, a concrete state (i.e. non-symbolic) model checker for TLA+ can fully prove, in under a second, some propositions quantified over an uncountably infinite number of executions, each of infinite length.
Of course, many interesting properties involve nested interleaved quantifiers (i.e. interleaved ∀ and ∃), and such propositions are strictly harder to prove (or be convinced of) than propositions with only a single quantifier.
Also, note that the correctness of a software system, unlike an abstract algorithm, can never be actually proven as it is a physical system for which there is no notion of proof. Even a complete proof (deductive or model-checked) provides confidence up to the belief that the hardware will behave as expected, something that can only be true with high probability. That is, indeed, why balancing cost and confidence is important, since complete confidence is never possible anyway.
[1]: Indeed, in some logics, both existential and universal quantifiers are not primitive and defined in terms of another operator, epsilon (https://en.wikipedia.org/wiki/Epsilon_calculus), which also exists in TLA+ and other formal logic languages.
Yeah, I hand-waved this a bit when I said "if we squint at the test suite for a 'sort' function, we'll see propositions like...". I don't think it's too far off though, since we can (and, as I argued, should) tease apart the data from the assertion, to get something like `let x = 3; assert(x > 2)`; and that's equivalent to `(lambda x: assert(x > 2))(3)` (which old-school Javascript devs may remember as "IIFE", before JS had 'let').
> We can see that because if unit tests could express existentially quantified propositions then they could also express universally quantified ones by negation, which is available to tests, because existential and universal quantifiers are equally powerful, and can each be expressed in terms of the other as `(∀x : P(X)) ≣ (¬∃x : ¬P(x))` (we don't even need to resort to classical axioms because all domains are typically finite).
Hmm, I hadn't thought about that. If I try stretching my constructive-logic argument a bit, we could implement negation `¬A` as a function `A -> Void`. Rewriting your `¬P(x)` would give `¬∃x : (P(x) -> Void)`, and rewriting the remaining negation gives `(∃x : (P(x) -> Void)) -> Void`. In a constructive setting we can take `∃x : (P(x) -> Void)` to be a pair containing `x` and a function of `f: P(x) -> Void`, and hence to implement this in general we need to call that function `f` (since it's the only way to construct `Void`), so we need to construct an argument for it, which requires constructing `P(x)` for general `x`, and hence requires a proof of `P(x)` that holds for all `x` (i.e. universally quantified).
I'm not sure of my intuitions around this. I got the above by "following the types", but test suites rely on runtime execution, which is inherently untyped (e.g. every program in a typed lambda calculus also exists in untyped lambda calculus if we erase the types (and translate type-adjacent features appropriately, like dictionary-passing for type classes)). From an untyped perspective, the idea of `Void` becomes dubious; it feels like `P(x) -> Void` throws a `Success` exception, and that's the only way for a test to pass (returning normally is assumed to be failure, and `Success` is encapsulated so we can't throw it directly). That feels very strange to me, but worth thinking about some more...
Right, but even that is an unquantified proposition.
> If I try stretching my constructive-logic argument a bit, we could implement negation `¬A` as a function `A -> Void`
There's no need for constructive logic specifically (a test presumes that it runs in some bounded time, which means we're operating on finite domains). Tests can invert both their assertions and their result, so if they were existentially quantified they could also have been universally quantified.
Indeed, tests can express (though not effectively execute) an existential quantifier (∃xP(x)) over a finite domain (though not infinite ones):
for x in Domain {
if P(x) return true // test success
}
return false // test failure
And that can easily be turned into a universal quantifier through negation: for x in Domain {
if !P(x) then return false
}
return true
This is the exact transformation as in (∀x : P(X)) ≣ (¬∃x : ¬P(x)); the internal predicate is negated and then the entire thing.> but test suites rely on runtime execution, which is inherently untyped
Types are a syntactic concept and they are, therefore, a property of a language; a language can be typed or untyped (types serve as part of the rules for determining whether a string is in the language or not). There are several ways to describe execution, but usually it isn't described as a language, so it is no more typed or untyped as it is yellow or not yellow.
This state machine is usually embedded in the code, but code has a lot of noise around the state machine that makes it harder to see the underlying state machine.
Formal methods outside of that are just silly. You can prove your software meets some spec, but the spec will always be wrong. Either it’s wrong today because it’s incomplete or you misunderstood something, or it’s wrong tomorrow because systems constantly evolve at high speed.
I feel like the silly absolutism of formal methods folks held back software development as a field, what we needed was practical methodologies to reduce bugs and accelerate development not philosophical waxing about “correctness”, a concept which does not exist for most software.
Fortunately I think we’ve mostly moved on now, and there has been great work on testing, language design, methodology development, that has had real impact. The sheer scale of available software and how incredibly reliable it is, has made formal methods irrelevant to mainstream software development.
While many requirements existed, the design assurance required specifying what problem was being solved, how it’s being solved, and the safety/security properties. Then, proving that the security properties were embodied in the artifacts in all states. That problem is difficult enough that most people who succeeded were also geniuses good at programming and math. Geniuses are in short supply.
The lessons learned papers also pointed out a recurring problem. Each requirement, security policy, algorithm, etc needed a mathematical specification. Then, proof techniques were necessary that could handle each. Instead of developing software features, you are now both developing software and doing R&D on applied mathematics.
Steve Lipner’s The Ethics of Perfection pointed out it took several quarters to make a major change on a product. The markets value development velocity over quality, even demand side. So, that’s already a no go for most businesses. Likewise, the motivations of OSS developers work against high assurance practices as well.
If you want formal verification, then you are forced to make certain decisions. For one, you need to build simple systems that are easy to verify, from design to language. Then, you use the easiest, most-automated tools for the job. Most drop full verification to do automated, partial checking: TLA+, Alloy, SPARK Ada, Meta’s Infer, etc.. If doing full verification, it’s better to make or build on reusable components like we see done with CompCert and seL4. GEMSOS and VAX VMM advocated that back in the day, too.
So, most software development isn’t fit for use of formal verification. I think we can default on a combo of contracts (specs), automated testing, static analysis, and safe languages. If distributed, stronger consistency when possible. Then, use CI to run the verification on each commit. Some tools can generate patches, too.
[1] You can see Lakatos in Dijstrak's library for example https://dijkstrascry.com/inventory#box4
I said that certain philosophies of software design are uniquely unsuited for formal verification.
Besides, tests and formal verification are different. A test is essentially a formal specification for a single point or, depending on how you test, multiple, potentially random, points. Writing or changing a test is less time intensive than writing or changing a full formal specification for an entire subsystem, therefore tests are more suited to volatile software development practices.
And having no tests is even more suited...
Not true!
Well, it is true of TLA+, but that is why TLA+ is not the future. The future is dependent types.
It is true we'll always be programming and specifying as we go to some extent. That's why formal methods have to be part of the software itself, not some extra burden off to the side.
Dependent types is basically unique in doing that. You can right some code, then reason about that code. Or you can write a program where the propositions/proofs and executable program proper are so intertwined the separation stops making sense in theory too.
This, and this alone, supports the dialectic of "doing" and "planning" that is hardly unique to software development, but rather is emblematic of most sufficiently complex human activities.
----
A really nice example to make this concrete is Lean's standard libraries's `SatisfyingM` and `MonadSatisfying`
(See https://leanprover-community.github.io/mathlib4_docs/Batteri...)
`SatisfiesM` is an attempt to support "external" reasoning: one already wrote a non-trivial monadic action, and now one wants to do Hoare-like reasoning about preconditions, posts conditions, the returned value, etc.
But a different monadic action that uses the first one might need to then internalize this reasoning. For example, it need to turn a pair of (program, proof about program's return value) into a new program that returns the thing and the proof.
This is `MonadSatisfying` is for. It is a constraint on the monad saying we now how to internalize such a proof.
The "external" proofs are the only ones TLA+ and similar tools support, and the honest truth is that you might want to make them, you might be forced for compliance/legal/insurance reasons to make them, but you are never intrinsically forced by the software itself to make them.
The "internal proofs", in contrast, are so required. The represent different abstractions which state their interface such that it is impossible to use them incorrectly --- providing a proof is a non-negotiable obligation on user of the abstraction just as providing a "regular" argument is a non-negotiable obligation to making a regular function call.
In addition to supporting to modular reasoning --- write a very fancy thing and then make sure your coworkers don't use it incorrectly --- this fixes the economic problem --- gotta do the proofs or the code won't compile!
----
If you have no idea how to start programming more formally, here are some ways to start:
external proofs:
you probably have tests. Think, how did you know those test cases are correct? Can you rewrite individual unit tests as property tests, to "formalize" that reasoning (yes, it is still formal, even though property testing is old and simple. "formal" means "explicit" not "fancy"!). If you can write a property test to by checked by probabilistic methods, then you implicitly have a property you could proove instead. Problem solved!
internal proofs:
Does you program have any `panic("this should never happen")` or similar? Well, why shouldn't it? Types, even non-dependent ones, allow for https://en.wikipedia.org/wiki/Principle_of_explosion, which is nothing more than showing some case-alternative has been given an impossible value, and so must be dead code. Every bit of code you think is unreachable, and your intuition is not correct, has a proof saying it is in fact so, and that proof is needed internally do "discharge" the branch.
Every assert? remember "assert c = branch c -> ok; not c -> panic("this should never happen")" so all your assertions are not things you might prove!
Even people that never have written down specs in their life probably have some assertions or panics lying about. So they too have something to formalize.
If we can't provide a pathway from where we are to a more formal world that is understandable to garden variety developers, it will remain an ivory tower occupation.
(And, in fact, things will get worse. Because in a world of LLM churnage, provable correctness becomes even more important)
A quick google didn't turn anything up, but gosh there must be. Property tests have been around for ages.
> Are there examples for proving unreachability?
Yeah I do this all the time even in Haskell with just GADTs. You can do it in Rust with ! Too.
> Has anybody successfully managed to explain how this is applicable if you don't have a deep background in formal methods?
Amateurs played around with Haskell and now Lean all the time. But I guess that is a "certain type of person". We'll see how this stuff goes mainstream over time. E.g. maybe someday Rust will get dependent types.
> If we can't provide a pathway from where we are to a more formal world that is understandable to garden variety developers, it will remain an ivory tower occupation. > > (And, in fact, things will get worse. Because in a world of LLM churnage, provable correctness becomes even more important)
Part B helps with A a lot. If people writing fancier types are better able to leverage LLMs, and get more productivity boost accordingly, the basic economics will push it mainstream.
After all, there's more demand for "equally bad software quicker", than "better software at the same speed". :)
But making it more accessible might just trigger enough people's "I don't want to push out absolute garbage if I can avoid it" buttons.
That's why I was wondering if there are already mainstream-accessible examples/explanations. It seems your search yielded the same results mine did :)
> But making it more accessible might just trigger enough people's "I don't want to push out absolute garbage if I can avoid it" buttons.
Yes, there is that too. Simple love of the craft.
> That's why I was wondering if there are already mainstream-accessible examples/explanations. It seems your search yielded the same results mine did :)
Just keep watching Lean. Now it is dominated by pure math, but people will be writing more programs over time also.
Formal verification is just not as common as you might think in these highly regulated industries; the expertise isn't as prevalent as your average HN reader would think.
On the one hand, I _want_ this to be true both for selfish and practical reasons: selfishly because I'm very very good at learning things that require an academic learning approach (read a book, work some exercises, practice) and if something I'm good at is important, that means more money for me. Practically because they're right, software _is_ big and complicated and hard to get right and as a practitioner, it's really frustrating when it does fail and I'm scrambling to figure out why.
On the other hand, though, nobody ever seems to make a compelling case for how formal methods proposes to solve that problem. This author actually does better than most by pointing out how most modern "design" is a waste of time but doesn't really go into why TLA, say, is better than (demonstrably mostly useless) UML. There's sort of an implied suggestion that if you dedicate a few months (or years?) to learning TLA, you'll reach enlightenment and understand how it's helpful in a way that's impossible to articulate to the unenlightened. And that's not impossible to imagine! Calculus and Bayesian statistics are kind of like that; you need to really understand them before you can really see why they're useful.
I always find myself left applying what I call "project manager" logic - I need to make a snap decision right now about whether or not I should invest time in this "formal method" stuff or not so I have to be observational: if it was really that helpful, more people would be applying it and the benefits would speak for themselves. They've been around a long, long time, though, and never caught on - it's hard not to conclude that there's probably a reason.
I find myself using a "method" of some kind when faced with a problem that's hard to think about. A communications protocol - nice to have a state machine to describe it for example. TLA obviously fits that niche even better. I've been lucky enough not to have too many problems recently that felt like they justified that effort but it's of incredible value when one does. Domain Specific Languages - so much better to use a parser framework than hand-code if you want to avoid all sorts of undesirable outcomes.
Currently most of my rework comes from changes to the requirements and our "customers" not really knowing what they want till we give them something and they say "not that."
This is partly because the people asking for things don't fully think out all the implications of what they're asking. It's mostly about not having enough knowledge in one place to make good decisions on.
My impression is that there are actually not that many business domains where a large investment in time and money to get domain logic correctness from 98% to 99.99% correct is actually called for.
Formal methods are a large investment, too. No two ways around it.
Also, while they havent really caught on in general, some of their ideas have made it into modern type systems.
My issue with the entire discussion, and lots of the community is that formal methods are not all the same.
Some are expensive, yes. If you insist on doing them all, you'll never finish anything. But that's not a reason to dump the entire set.
Im sure if somebody wrote a blog post that provided a sensemaking framework for WHEN formal methods are appropriate vs waste your time it would be popular.
I see hints of this all over the thread - e.g. some people seem to believe it works well for state machines (though I suspect they were nontrivial ones), but as a topic it rarely ever seems to be addressed head on. OP certainly doesnt.
I suspect this trope will reiterate again next year with "formal methods! useful! sometimes!". The community has been like this for decades.
But, the impression I got was that the rigors of the formal verifier would sort of impose a limit on the complexity of the design just based on… completing in a reasonable timeframe and in a reasonable amount of memory space. Maybe the true victory of demanding formal verification would be fixing:
> software is big and complicated and hard to get right
By making big complex programs a pain to work with, haha.
I think I’d like to see a descendent of Property Based Testing that uses SAT or TLA techniques to rapidly condense the input space in a repeatable fashion. We should be able to deduce through parsing and code coverage that passing 12 to a function can never follow different branches than 11, but that -1 or 2^17 < n < 2^32 might.
Well... that's not a good logic on any area, and on software development it's twice as bad.
Most software projects still fail. And that's not "market failure", it's just "didn't manage to build it".
Starting by at least with UML everyone, regardless of their role, can have some kind of understanding of what those diagrams are about, and UML covers most scenarios from software delivery product lifecycle.
Whereas TLA+ is mostly abstract code that after proving a specific algorithm is right, has to be manually ported to the actual software development stack being used, with the caveats a manual translation implies, and only applies to algorithm logic, nothing else.
UML has a place at my work, TLA+ will never get one.
With the large footprint of Rust, I've seen questions raised about how this will work. But haven't seen good answers.
Would love to read more.
The thing that's never made any sense to me about using a marble checker for anything but concurrency issues (which are hard enough to warrant it) is that once you've validated your model you have to go actually implement it, and that's usually the most error prone part of the process.
If the correctness spec has to be written manually but prevents you from diverging from the spec in your implementation, that's a huge step up from extrinsic model checkers.
Pre-design makes sense to me in certain limited circumstances. A limited amount of architecture planning can be valuable (though in most cases formal methods aren't useful for that), and for certain kinds of concurrent algorithms it could even be worth it to validate the design in a different language. But most of the time it's not worth doing the design twice when you can get pretty good guarantees from static analysis on the design (the code) itself.
I sometimes work in areas where the error budget is essentially zero, with an element of concurrency, and for those there is a design stage before the build stage. I could see the value of formal methods there. At least I could execute a model with a model checker, which makes it one step closer to the code than a design doc or RFC.
Full disclosure: I haven't actually used formal methods myself, I've just been interested in the idea for a while and have done some reading on it.
Some of them (not all, I know a few good ones) are, actually one trick pony. For example, ask them about other formal methods systems they learnt or tried in the recent years, they would claim they are "too busy" to learn anything new.
Recent easier to use formal methods: 1. FizzBee (Uses Python dialect, almost reads like pseudo code) 2. Quint: Another formal language with easier to use syntax 3. P: Use syntax similar to C# users.
The author of the posted article himself posted another article that Formal methods solves only half his problems (https://brooker.co.za/blog/2022/06/02/formal.html) But the problem he mentioned is actually solved by PRISM (It is not new). But Brooker just won'd bother to look around or learn.
I actually generate a unit tests with LLMs a lot lately. They do a decent job and you can just ask it to be a little more exhaustive, test any edge cases it can think of, or instruct it to deal with specific ones. I know a fair bit about how to write good tests and the effort you can put in that. But LLMs can generate better tests than me way faster.
If anything, they are less likely to do a half-assed job of it than me because I tend to run out of patience doing repetitive tedious shit. This is a healthy trait to have for a software engineer: we are supposed to make stuff easier by automating it. If it feels repetitive, your reflex should be to write code to make it less repetitive. Documentation is the same. I generate that too these days and since it is so easy, I do it more often and sooner.
LLMs might trigger a minor revolution in the adoption of formal method verification. Generating a correct specification is a combination of tedious and probably relatively easy for an LLM given enough context like working code, documentation, and other hints as to what the thing should do.
I'd be a lot more likely to bother with that stuff if I can just let it generate specification and then briefly glance through them than if I have to spell everything out manually.
I think using Rust kind of signals that you care about correctness. It's compiler is probably the closest to proving your system is probably correct that you can get without resorting to formal methods. And probably a lot easier than trying to bolt on formal methods to languages that don't even use a compiler or explicitly specified types.
Once you have formal properties, you can write property-based tests using them, and I wonder how much of the benefit of formal methods could be obtained just by doing this. It's another example of using increased computing power (testing) to substitute for expensive hand labor (proving theorems).
I'll also observe that even theorem proving systems benefit from a kind of property based testing. If there's a goal to prove the existence of a value satisfying some property, this is essentially a property based testing problem. Similarly, find a counterexample to a universally quantified formula (also an existential problem) can be used to prune off unproductive branches of a search tree.
EDIT: ah I see where there might be confusion--obviously a library for generating random test data and making assertions about code under test itself doesn't constitute anything like "formal methods" but the idea of using that library in the way described in the paper linked from the article does. But that's kinda always the thing about software libraries..
Of course, if you are interesting in proving statement about abstractly defined constructs: infinity is no issue as your domain is uniform. In real-life, this is not the case.
I keep trying to learn it, but you need to be a real expert. Like "I wrote the compiler" level expert for most systems.
For example I tried to prove a varint encoder/decoder. It worked for one or two bytes, but not more. Asking for help the reason was that the compiler internally only unrolls loops 5 times, or some random internal detail like that that you could never really hope to know.
I've been learning Lean recently, and ... I mean I like it, but if you learn it you're going to encounter documentation like this:
> Definitional equality includes η-equivalence of functions and single-constructor inductive types. That is, fun x => f x is definitionally equal to f, and S.mk x.f1 x.f2 is definitionally equal to x, if S is a structure with fields f1 and f2. It also features proof irrelevance, so any two proofs of the same proposition are definitionally equal. It is reflexive, symmetric, and a congruence.
And that's not really a knock on Lean - it seems to have some of the better documentation out of the alternatives.
Formal methods don't have to be complex. The issue is, most formal methods are designed as an academic exercise to demonstrate a specific topic the professor was interested in. (Or TLA+, that was designed specifically for writing papers)
It looks scary but actually all those concepts are really simple and probably things you are already familiar with.
Basically you just need to log events - which you can even have for free in event-driven architectures etc - and run some predicates on execution traces, e.g. "Always(Locked, Implies(Eventually(Unlocked)))", etc.
One can run it on historical traces, but also couple it to stress tests, fuzzing etc to explore the state space.
Quite simple, powerful and widely applicable, and doesn't require a model, just predicates.
Formal methods connotes comprehensive evidence about system behavior. In the case of TLA and similar system that’s a state machine and not the real system but the output for the state machine is a proof that your LTL/CTL/TLA properties hold for all behaviors of the system (traces or trees of traces).
I was mentioning it in the same context as e.g. the Amazon paper on lightweight formal methods [0] where they use property-based testing to test that the implementation conforms to the model specification.
In a similar spirit, linearizability checkers like Porcupine [1] are a nice mix of formalism and usability.
I really like those because they are incredibly powerful, don't require a model and verify the actual implementation - obviously as you mention they are not exhaustive.
[0] https://assets.amazon.science/77/5e/4a7c238f4ce890efdc325df8...
Formal methods: Just good engineering practice? - https://news.ycombinator.com/item?id=40753989 - June 2024 (149 comments)
15 Years of Formal Methods at AWS: Just Good Engineering Practice? - https://news.ycombinator.com/item?id=40283052 - May 2024 (1 comment)
Reproducible? Sure, that's what unit tests are for. Make a change, prove that you didn't break any behavior that anybody relied on.
But if you have to do a three-month-long formal proof run because the specification had a one-line change, then you're not agile, under any meaningful definition.
(Where did three months come from? Thin air. I don't know how long a true formal proof would take. Depends on how many things you're proving, how long your spec is, and how much CPU power you have. I would think, though, to formally prove significant properties of a large code base would take a significant time.)
Get back to me when the formal methods actually generate high quality code that cannot deviate from the spec.
This summary page links to the full PDF: https://www.adacore.com/tokeneer
That would be useful but there is a fundamental issue: code is too specific. İn formal specification you usually don't specify that detailed but you specify the general behaviour of the system. So usually the specification corresponds to a lot of programs with subtle differences. That's why code a documentation falls short: you don't know what is intentional and what is just a random choice. It's simply too specific to describe high level requirements.
The other way around (verification of program with respect to the specification) can be more feasible to implement.
Or as a baby step, does TLA+ emit comprehensive test cases? Then we as programmers still have to do the hand transpilation step, but at least we can be assured our implementation is correct (if not efficient).
While I do think TLA+ is relatively easy to pick up (especially compared to Isabelle or Coq), and I think it's pretty awesome, I'm hesitant to say it's as "easy to pick up as Python". You need a basic understanding of set theory and state machines to even get started, and more advanced concepts like algorithm refinement to get into the really useful juicy stuff.
When I've tried to push TLA+ at work, the mathey syntax is usually a non-starter for most of my engineering colleagues. I don't think my coworkers are stupid by any stretch, but they are decidedly uninterested in re-learning any discrete math (if they ever learned it in the first place, which isn't a guarantee). For an engineer LARPing as an academic like me, TLA+'s notation isn't really hard at all, but I will often forget that most engineers only think in code.
Would they be interested in Quint?[0] If I understand correctly, Quint is very similar to TLA+, but with a more "code-looking" notation. And it has some nice dev tooling, like an LSP, which also lowers the barrier to entry.
It’s still an uphill battle, because the comparison to unit tests always comes up, and I have to explain how they’re not equivalent.
My impression is that it’s far from a magic bullet. Writing formal specs is basically like writing the code/tests just differently. And the more it covers the more it becomes the same thing. And it suffers from the same problems.
My conclusion every time was that the code itself is the formal spec and the formal spec is the code.
By analogy with construction, the code is both the building and the blueprint.
Yes you can end up with tautological specs, where it's more or less a copy of the code. E.g. you aren't going to benefit much from formally verifying `max()`.
But there are many many cases where the formal specification is much much simpler and more obviously correct than the actual code. The classic examples are:
* Any kind of encoding / decoding transformation has the property `decode(encode(x)) == x`.
* Sorting: any sorting algorithm should result in a sorted array (forall i, j: i < j --> array[i] <= array[j])
FORALL paths P from A to B:
|shortestPath(A,B)| <= |P|
And with software you can just do an update if something is found later. Not a big deal usually. There are exceptions of course. With hardware things get more expensive. But still, judging from the state of e.g. most bluetooth and other hardware I've ever owned, the barrier of good enough is pretty low there too. Mostly things work and you can usually work around minor issues when they don't.
Some, software justifies/requires going above and beyond doing testing. Especially if it controls critical hardware. I've never worked on such stuff. And even there the notion of releasing often and breaking stuff by testing it seems to be catching on. For example SpaceX is doing agile rocket development. They launch starship every few months until they get it reliable enough to launch things into orbit.
Some problems specify well. A database is an example. A spec for a database can be written in terms of exhaustive search, running the query against everything. Now show that an efficient database yields the same output.