Smart-proven contracts
In the decentralized-storage ecosystem, a specific type of program is used for storage access: smart contracts. They are used for anything to read or write over a public decentralized database (aka blockchain). They are deployed over multiple nodes of a network and will be executed when needed (for record, trade, election, any kind of process...). Smart contracts are intended to be in charge of the integrity of the data and transaction, that makes them particularly critical, like a 'real contract'.
New kinds of tools are slowly making their way to ensure smart-contract reliability. Let's see how.
Criticality of smart contracts
One fundamental leitmotiv of a blockchain is trustlessness: you don't need to trust someone in particular to interact with (indeed, the trust is shared because decisions are decentralized, data are immutable, there is voting by consensus...).
However, smart contracts are still basic programs. They can be buggy or malicious. They easily broke this trustless principle. One way to be sure that a smart contract is really doing well could be resolved by introducing a lot of tests, but corner cases are hard to mitigate.
Classical testing limits
By writing unit/behaviour testing and assertion checks you will need a huge amount of tests (and time) to guarantee that by calling function 'A' with argument 'x', you will get 'y' and avoid any side effects into the rest of your program. Furthermore, assertions and tests are runtime-based, so you need to simulate a small blockchain and i/o.
A solution to this labour could be to write our program as a way to be sure-by-design regarding its behaviour (integrity & reliability). This assertion is equivalent to building a mathematical proof that our program satisfies a property at a given time. Building such a proof is equivalent to make the formal verification of our program.
Formal verification of programs
We do formal verification by checking -in a determinist way- that the program behaviour satisfies some requirements, basically verify that it does what we want. However, these properties are described using a formal modeling, a kind of mathematical description of computation, axioms and states. Model-checking and theorem-proving are the two main prominent methods in that field.
This is very different from classic testing or assertion methods! Indeed the verification is done upstream while unit test/assert is done at runtime.
Formal verification allows us to prove that some behaviours will never occur because it was written that way.
Formal methods are still in research and development for decades by logicians and computer scientists. Because potential pay-offs of formal verification are high, there is a big shift between true-formal partisans and pragmatic people, who favor more direct and inquiry-relative ways to test and evaluate programs.
Smart contracts are good candidates to reunify both sides. They are small, very domain specific, public and they must be rock-solid on what they are supposed to do: critical operations on sensitive stuff. They are immutable, so any mistake or vulnerability can lead to a disaster.
Nevertheless, verifying a program from scratch is a huge undertaking that requires time and specialists skilled into formal methods. We need tooling to bring this valuable paradigm into more common programming.
Here an overview of the current state and efforts to understand the usage of formal verification in smart contracts.
Verify existing languages
Smart contracts are actually written using domain specific programming languages. There are relatively few programming languages that are well-suited to work with, and some of the available languages can be difficult to learn or use.
We can enumerate: Clarity for Bitcoin, Solidity or the less-buggy Vyper for Ethereum-based blockchains, Liquidity or any Michelson transcript tool for Tezos (because nobody will force you to write anything in Michelson).
There are efforts to bring formal method verification into contracts, one way is to verify the language it-self. It means: avoid any by-design mistake of a language (overflow, underflow, mistyping) that can be error prone.
On Ethereum VM
Working on Ethereum-based smart contracts (mostly) imply Solidity: it is a powerful and flexible language that is widely adopted. However, it can be difficult to write secure and reliable programs, as it is easy to make mistakes that could compromise the security or functionality of a contract. This is a well-known issue.
Indeed, Solidity comes with its inherent problems:
- Too permissive / on the very top of the Ethereum Virtual Machine (EVM)
- Procedural-coding is encouraged, not good to apply constraints over property
- Language compiler is changing too quickly for verification tools to keep up
- The only way to know the meaning of a Solidity contract is to compile it into bytecodes
- How to ensure that the bytecode behaviour is the same from the high level language?
Efforts made over the Ethereum blockchain are present as well as academic research engaged to modelize the EVM. However, concrete and production-ready tooling are absent. Indeed, it is highly non trivial to evaluate a language's robustness from a language that was not designed for. It requires to be translated into a provable language. However, original semantics can be lost in the process...
There is, like this Ethereum initiative to provide an EVM Isabelle implementation, or to formally verify the Vyper language. It leds to a fine bug tracking which highlight this kind of funny issues like this one or this one, where compiler initialization or call incoherences were found.
Since the announcement in 2016 and some projects or proof-of-concept articles, nothing more has been done over EVM. The consequence of a lack of profile working on these subjects.
Current efforts are focus on security analysis tools like Sourcify or Mythril; which uses symbolic execution to analyze Solidity codebase and generated bytecode to identify potential vulnerabilities.
A common workaround to track contract vulnerabilities remains bug bounty. There are many bounty platforms like Hacken or Immnuefi...
On Tezos
On the other hand Tezos is leading formal verification applied to smart contracts by proposing proof-tooling to interact with its blockchain. They intend to fully certify their whole codebase and contracts.
A common approach to write smart contracts over Tezos is to use the low-level Michelson language. This language has been entirely formally verified, which means: they don’t have known bugs and vulnerabilities. It has been done with Mi-Cho-Coq, introduced by Nomadic Labs. Mi-Cho-Coq is a verification tool for Tezos smart contracts relying over the Coq proof-assistant to apply the Curry Howard correspondence (TLDR; establish mapping between mathematical proofs and program computations). With Mi-Cho-Coq, developers can now write smart contracts with a robust proof-assistant, verify their correctness, translate it to Michelson language, then deploy it to the blockchain. If any errors or vulnerabilities were found, a report details identified issues and suggests potential fixes.
Mi-Cho-Coq acts like a specification library for smart contracts and can be especially useful for developers working on complex or critical contracts, but still, it is requiring to code in Coq; it can be very costly to define every piece of a contract.
Dedicated languages
Another alternative is to write smart contracts with a programming language that embarrasses a provable approach. It includes: define every asset of the program, clear values, guarantees of termination and outcomes, and of course, the program won't compile in case of semantic incoherence.
Ethereum's challenger: Act
Despite Solidity's widespread usage over Ethereum blockchain, there are common efforts to develop EVM compatible formal-based smart contract languages: Act. Act allows specification of storage updates, pre/post conditions and contract invariants.
Act is a statically typed, functional programming language written and inspired by Haskell. It includes:
- Strong type system that ensures that contracts are correct and error-free
- Low overhead for better perf than Solidity
- Type-safe and automatic memory management to prevent vulnerabilities
- A built-in support for formal verification to enhance the correctness of their contracts
- Numerous proof backends like Coq, SMT solvers or HEVM.
To sum up, everything that Solidity cannot employ.
Act is still in development and not production-ready.
Tezos' challenger: Archetype
In 2020, another domain specific language comes out to specify smart-proven contracts over Tezos: the Archetype language.
Archetype is developed in OCaml and maintained by the Edukera team, which has already developed a mathematical proof-assistant for education based by using Coq. So, they are not at their first try. Archetype can be seen as a high level language like Liquidity or Solidity, but the comparison stops there.
Archetype offers nice features like built-in state-machine mechanism, asset definition and integrated property specifications. These in-code specifications allow us to specify what we want to expect from the written contract.
Let's make a scholar example of a voting process and find a way to secure our program.
Firstly, define assets that will be manipulated within the contract: voter, ballot and a winner when the vote will be buried:
(* Dates to delimit our vote in time *)variable startDate : date = 2023-10-12T00:00:00variable deadline : date = 2024-10-12T00:00:00asset voter identified by addr { addr : address; hasVoted : bool}asset ballot identified by value { value : string; nbvotes : int;}asset winner { value : string}
Then we define a vote function where preconditions and postconditions are clearly defined. Archetype requires you to define one postcondition (section 'effet') per incoming argument of one entrypoint. We can also clearly state that an argument doesn't change by specifying 'A = before.A'.
entry vote (val : pkey<ballot>) { require { c1 : voter.contains(caller); c2 : state = Voting; c3 : voter[caller].hasVoted = false; } effect { voter.update (caller, { hasVoted = true }); if ballot.contains(val) then ballot.update(val,{ nbvotes += 1}) else ballot.add({ value = val; nbvotes = 1}) }}
When the vote goes end, we bury the vote and transit to a new state:
transition bury () { require { c4 : now > deadline; } from Voting to Buried with effect { var nbvotesMax : int = 0; for b in ballot do nbvotesMax := max(nbvotesMax, ballot[b].nbvotes) done; for b in ballot do if (ballot[b].nbvotes = nbvotesMax) then winner.add({ value = ballot[b].value }) done }}
We can add some specifications that should remain true at any moment during the life of the contract. This is called invariant. As you can imagine, the sky is the limit; the more there are (well) defined properties, the more the contract will be clear, relevant and robust.
There is no systemic method to establish contract invariants. You need to figure them out by asking "What my entrypoints should preserve?" or "Which relations hold between my variables?".
specification { (* End date cannot be before or equal to the starting date *) contract invariant s3 { startDate < deadline } (* Ensure there is no duplicate or missing vote *) contract invariant s4 { (voter.select(the.hasVoted = true)).count() = ballot.sum(the.nbvotes) } (* Check that the voting count of winning ballot is greater or equal to all other ballot *) contract invariant s5 { forall w in winner, forall b in ballot, let some wb = ballot[w.value] in b.nbvotes <= wb.nbvotes otherwise false }}
Postconditions and invariants are made to describe the correctness of independent entrypoints, but they cannot capture the effect of a sequence of calls on the contract. This is still an active research field to provide powerful formal method approaches capable of assessing properties during a sequence.
Archetype automatically transcripts provided specifications to be analyzed by Why3 platform. Why3 is surrounding the Coq proof-assistant to verify software and guarantee their trustability. It is literally a program that verifies other program's semantics. Find here the official manual.
Archetype provides a builtin simulation system to test a contract (like it is published). Simulation is still relevant to figure out if you miss to define a property within the contract. It is a cool feature to iterate progressively to a fully-proven smart contract.
Archetype is a relatively new programming language which is focused on making easier and more secure the way to write smart contracts. It is still in the early stages so it is not yet widely used, but -because of its readability and strong type/formal verification support- it has the potential to become a popular choice over Tezos development in the near future.
Conclusion
Formal verification paradigm and smart contract are meant to have a love story. Indeed, smart contracts are complex things and everybody is looking for a better regulator/standardization system:
- Formal verifications allows theoretical long term time-saving, less debug and less maintenance
- Tend to domain specific, strongly typed, close to our domain logic, less overhead to manage
- Formal verification can help ensure that the contract is resistant to attack or manipulation, and that it will not be vulnerable to exploitation by malicious actors
- By providing a higher level of assurance that the contract will work as intended, formal verification increases trust and confidence in the contract, leading to higher-value transactions
- It introduces a faster development cycle. Indeed, formal verification relies on a symbolic approach, compared to classical testing which only tries 'real' values. It induces better code coverage. Furthermore, formal verification will avoid many 0-day-design-errors that will be a disaster to fix once in production
But it comes with drawbacks:
- There are currently relatively few programming languages that are well-suited for formal verification, and some of the available languages can be difficult to learn or use
- New programming languages imply time to learn them; are you so sure to deal with a language which involves few developers? It is risky, it could imply to reach only junior profiles and a lot of Copy/Paste of rare code snippets that could be poorly written
- It's a complex activity to write a full program specification. We need to be skilled to handle generic proof-assistant, such tools like Act and Archetype tend to offer an opportunity to reduce the gap
- You are limited by the specifications you identify. If specifications are poorly written, violations of properties (which lead to vulnerable executions) cannot be detected by any formal verification
- Furthermore, any proven-code that runs over a virtual machine is obviously subject to the virtual machine implementation.. Is this VM fully-proven?...
The state of formal verification on smart contracts is still in early stages, there is a lot of work to be done to make it more accessible and widely adopted by developers. The smart contract field continues to mature and evolve: security, trustworthiness, and reliability are the greatest challenges to overcome. As we saw, formal verification is a powerful tool to improve these aspects of smart contracts, by reducing the risk of errors or vulnerabilities that could compromise their integrity.