To test it you need to go your coding agent and say "Use https://github.com/kurrent-io/poes/ to build and verify {prompt or spec} and then provide me with a proof-of-work (this is deterministic)"
This is specially useful in long horizon tasks because it helps the coding agent to anchor around important rules and it knows whether it has broken any previously written rules through verification.
The idea sits between formal verification and coding. We propose that the nature of event sourcing allows you to formulate code as pure functions e.g. the same input to the same function over the same state should produce the same output. It is important to have immutability and versioning of the state, else the above properties wouldn't hold.
When you extend that you can force, using the framework, to prove that every valid input should produce a valid output under certain rules and further to validate state transitions as well (bounded verification and weak-proof for unbounded) by exploring all of them.
The limitations:
- Exploring all the states is hard, so this can only prove smaller domains or you have to tell the agent to either break state in small parts or sample the state exploration. We have taken inspiration from the TLA+ is verified by TLC.
- It doesn't test what you don't specify, but you can keep on iterating, the coding agents knows how to keep on adding/remove things using the framework.
It provides bounded guarantees rather than complete formal verification. It confirms no rule violations occur within the tested domain, but cannot prove exhaustive correctness. Additionally, this approach only covers closed environments where inputs are fully controlled. But for open environments (production) you can persist to KurrentDB using the framework.
If you are interested you can read more about it at: https://www.kurrent.io/blog/proof-oriented-event-sourcing/
A few points not mentioned in the blog post which might be interesting to the technical audience: We tried multiple versions internally using Liquid Haskell + state transitions proofs, lean4, transpilation to TLA+ and an FSharp version but we found out that at the moment LLMs just find it hard to write proofs at scale and also just knows how to write Python better.