Enterprise DLT: The Leaky Trust Machine
By Dr. Elias Strehle.
Distributed Ledger Technology (DLT) comes with the promise of creating trust by technological means. While non-DLT systems often have a central operator who enjoys complete authority and little oversight, DLT systems use mechanisms like data replication and consensus algorithms to distribute control among a group of (hopefully) independent validators. Even if some of the validators are faulty or malicious, the system continues to work as expected. It is tempting to conclude that DLT systems are simply more trustworthy than non-DLT systems.
In the context of DLT, trust is often used as a catch-all term for a wide range of desirable properties such as security, predictability, reliability and truthfulness. This tends to result in an unfortunate halo effect: One shows that DLT has certain desirable properties (like censorship resistance and tamper resistance) which “create trust” and then concludes that DLT must also have other desirable properties (like confidentiality) one happens to associate with the term “trust.” The following example illustrates that DLT does not provide “all kinds of trust.” Like every other technology, DLT comes with tradeoffs.
The three core protection goals of information security, also known as the CIA Triad, are confidentiality, integrity and availability. A trustworthy information system should provide all three to a sufficient degree. In particular, the following properties should be satisfied:
- Trust in availability: Validators do not refuse to share data with authorized parties
- Trust in integrity: Validators do not modify data without consent
- Trust in confidentiality: Validators do not share data with unauthorized parties
Each kind of trust can be breached. In the context of DLT, a (deliberate) breach of availability is often referred to as censorship, a breach of integrity as tampering.
With regard to these kinds of trust, how do DLT systems fare in comparison to a “centralized” system with a single validator? To get a feeling for this, let us make an (extremely) simplifying assumption: Each validator is faulty with 5% probability. If a validator is faulty, it will breach trust whenever possible. If it is not faulty, it will never breach trust. The probabilities are independent; one validator being faulty does not make it more or less likely that another validator is faulty.
Under this assumption, in a “centralized” system with one validator, each kind of trust will be breached with 5% probability. What about DLT systems with two and more validators?
- A breach of availability occurs if and only if all validators are faulty. Recall that every validator has a full copy of the ledger and is therefore able to share data on its own, without having to rely on other validators. The probability that all validators are faulty is (0.05)number of nodes, which gets very small very quickly. For four nodes, the probability is already as low as 0.000625%.
- A breach of integrity occurs when the number of faulty validators is high enough to break the system’s tamper-resistance. The best Byzantine fault-tolerant consensus algorithms can tolerate up to one third (minus one) of faulty validators. That is, a breach of integrity requires 1 faulty validator in a system with 1, 2 or 3 validators, 2 faulty validators in a system with 4, 5 or 6 validators, etc. Given our assumption, the corresponding probability can be computed using the Binomial distribution. The result is a see-saw pattern with sharp decreases whenever the number of nodes is increased by three. For four nodes, a breach of integrity occurs with 1.4% probability, considerably lower than the baseline 5% of a single-validator system.
- A breach of confidentiality occurs if at least one validator is faulty. Again, every validator has a full copy of the ledger and can share data with any party – authorized or not. The probability that at least one validator is faulty is 1 – (1 – 0.05)number of nodes. For four nodes, this computes as 18.6%. But more importantly, the likelihood of a breach of confidentiality increases as the number of validators increases.
This deserves three separate conclusions:
- A positive side effect of data replication is that DLT systems are highly resistant to censorship (or loss of data). Distributed databases, however, generally offer more flexibility in fine-tuning data replication to balance availability against efficiency.
- DLT systems shine when it comes to integrity. This is unsurprising, since tamper-resistance is one of DLT’s core tenets. Highly sophisticated consensus algorithms ensure that the belief in the immutability of the ledger is justified as long as there are enough independent validators.
- DLT systems are terrible when it comes to confidentiality. Every validator stores all the data and could potentially leak it to unauthorized parties – be it on purpose, in the wake of an attack or by accident. Even worse, once a data leak has been detected, there is no technical way within the DLT system to find out which validator leaked it.
Obviously, the example oversimplifies the issue. But it illustrates a fundamental problem: Breaches of availability and integrity require coordinated misbehavior of a percentage of all validators (100% for a breach of availability, 33% for a breach of integrity). This becomes more difficult and thus less likely as the number of independent validators increases. A breach of confidentiality requires individual misbehavior of a single validator. This becomes more likely as the number of independent validators increases.
Note that this example only considers faulty behavior of validators, not other users. Some enterprise DLTs offer “channels” in which data can be made available to selected users only. But a channel needs validators, too. The tradeoff remains: More validators in the channel lead to higher availability and integrity but lower confidentiality. Blockchain channels do not “solve” the tradeoff of availability and integrity versus confidentiality. For enterprises storing data in an enterprise DLT system it is important to understand who can see their data and to assess the risk of a breach of confidentiality.
Public blockchains take themselves off the hook by defining all data as non-confidential. On a public blockchain, it is expected (and actively welcomed) that all data can be seen by everyone. By contrast, the business problems addressed by enterprise DLT systems are frequently associated with highly confidential data such as prices, supplier networks or material compositions. One possible remedy for enterprise DLT system is to reduce the likelihood and negative consequences of data leaks by keeping the information value of data on the ledger to an absolute minimum. Encryption and hashing can help – but both are difficult to get right and do not protect against metadata analyses and massive brute-force attacks. Instead, enterprise DLT systems should first and foremost strive for data minimization. Only the data that truly benefits from DLT’s unique combination of high availability and integrity and low confidentiality should be stored on the ledger. Everything else belongs somewhere else.