On June 22, in the “Ten Questions with Vitalik Buterin and Wang Feng” Vitalik talks about Casper and Sharding technology from Ethereum’s future to the industry landing scene, and then about programming language evolution and also shares his thought on the common problems faced by the entire blockchain industry.
One of the things we are interested in is the views and answers of Vitalik burtin’s on the challenges of blockchain data availability, which reads as follows:
Fred：We are familiar with the early POW and POS mechanisms; but could you please kindly explain the working principle of the Casper mechanism in a simple way once again?
Vitalik：The challenge is, that it is not enough to just verify that the blockchain is valid, one must also verify that all of the data in the blockchain is available in the p2p network, and anyone can download any piece of the data if they want to. Otherwise, even if the blockchain is valid, publishing blocks that are unavailable can still be used as an attack that prevents other users from taking money out of their accounts, by denying them the ability to update their cryptographic witnesses.
We have solutions, though they are somewhat complex; they essentially involve encoding the data redundantly and allowing users to randomly sample to check that most of it is online; if you can verify that most of it is online, you can use the redundancy to recover the rest of the data.
Randomly distributed throughout the p2p network. The basic idea behind the current Casper implementation is that users can send 32 ETH into a smart contract, and then once they are included in the blockchain they are added to the current validator set. Every block is created by a random member of the current validator set, and every 100 blocks the entire validator set needs to send a message “finalizing” some checkpoint.
In Ethereum’s case, there is this requirement that the blockchain must ensure that absolutely 100% of the data is valid and available; in Filecoin’s case it’s ok if one or two files drop off.
There are a lot of people who can feel that the context of the scene described in the passage above is not very clear, and they are dizzying to read. Why does Wang Feng let Vitalik Buterin introduce Casper and also will focus on data availability challenges and solutions? What is the logical correlation in the middle? To figure that out, lets start with the bitcoin ledger structure.
We say that the essence of blockchain technology is to solve the problem of trust establishment between the decentralized system nodes through the verification and consensus mechanism of distributed nodes, and realize the decentralized and distributed trust establishment mechanism. The process of verification and consensus, for the present, is mainly for two aspects, namely computing and storage. Of course, what we’re seeing is the result after the data is stored, but logically speaking, storage and computation can be validated and agreed separately.
The ledger structure of Bitcoin is shown in the above figure. A basic knowledge that everyone knows is that for a whole node of Bitcoin, all the block records must be saved. For any node, you can verify whether you have the correct record by verifying the Merkle Tree of the block you downloaded. This process can be simply understood as the process of comparing two Merkle Tree files:
So, in a block chain, By simply validating the Merkle Tree, we can get both the validity and consistency of the data, because the data of all nodes are the same, so a few nodes are not online and do not cause the unavailability of the data.
But we also know that because of this design idea, the blockchain has a lot of limitations, So a new technical solution, Sharding, is put forward, the so-called Sharding, which is a technology in the conventional database domain and is now being used in the blockchain field.
Let’s leave aside the pros and cons of Sharding, a special problem that will be introduced after Sharding is the availability of data. Because in the Sharding’s case, its not the same as before. According to the Sharding rules to store the data, the data in that partition is the same, but between the pieces, the data is different. So, we see today that if you want to implement Sharding, then,Smart contracts are largely incompatible with Sharding. Because smart contracts require the same data to be seen.
Let’s imagine that in a P2P network, the previous block-chained network connected to the nodes around it could always be verified with Merkle Tree for data validity.But now, in the case of Sharding, the Nodes we have connected to have the right data may be offline for some reason, and we don’t have the right data here, which can lead to bigger problems.This is also the reason why Vitalik Buterin mentioned about data recoverability at the beginning.
So, from Bitcoin to Sharding, the whole node ledger to Sharding is a somewhat similar process of moving from Sia to IPFS, where the evolution of technology is quite similar. Sia was an early storage project that offered a service similar to an electronic network disk. Sia’s currency is called SC. This project is different from what most people think. It is actually a bitcoin-like project. The mining mechanism using POW is to calculate the hash. Sia’s hash algorithm is called Blake2b, which is slightly different from Bitcoin. Therefore, those so-called hard disk IPFS mining machines can not dig Sia.
Sia uses a special parameter of Bitcoin called Timelock. In other words it delays a fact-accounting transaction, sets a conditional parameter within the delay time, and and a so-called file contract is completed using the script. The contract is to store the value of a specific segment of a Merkle Tree to the client periodically, and both parties verify that the Merkle Tree is holding the data for storage.
The implementation of the Storj project, as well as the proof of storage, is essentially similar to Sia. Both rely on the Client’s to do its own data holding validation. For these two projects, I personally feel that Sia is a “Aura of Light” and Storj is an “Engineering”. The Sia team had a lot of problems with this implementation at the time, but they managed to embellish some concepts and still implement the project. Sia’s problems include confidentiality, integrity, usability, and privacy.
Although Sia claims to implement public validation, it is basically a way of stealing the bell because there is only the Root Hash of the Merkle Tree in the chain. Merkle Tree’s Root Hash only guarantees that the data is not tampered with and cannot guarantee the data is held and available.
The algorithm implementation of integrity verification is dependent on the client and the server for block data and the Merkle Tree leaf node communication verification, and the communication cost is high.
Sia’s excavation work is POW’s, means that the miners simply calculate the hash value. Logically, there is no relationship between POW hash value and storage. There is no block reward for storage node storage. It is not helpful for miners to waste a lot of calculation of hash value to improve the security of storage.
Sia requires the availability that the client itself encodes the file redundantly.
Sia does not deal with privacy and confidentiality.
Sia is the first project that combines distributed validation and consensus with storage systems, and that’s the whole point of Sia. Sia clearly tells us that distributed storage is achievable, although there are many issues that need to be addressed.
Then later developed into IPFS and Filecoin. IPFS is a very good project, it can be simply summarized as using DHT addressing using BT to transfer DAG Object (Blob, List, Tree, Commit).IPFS is a better amalgamation of existing technologies, but IPFS does not address the issue of data integrity. Data integrity should be solved by FileCoin, but FileCoin has a big problem. Many people mistakenly believe that after IPFS hashes the data, it can guarantee that the data cannot be tampered with, which is a misconception.It is impossible to solve the problem of holding data (in non-academic sense) without a simple storage system with digital monetary incentives. on the other hand, the process of claiming the proof of holding data in FileCoin’s paper is logically unreasonable. This is not a technical issue, it is a logical issue.
Logically speaking, the proof of data holding is a game of two characters and four steps. The first role is Challenger, and the second character is the person who completes Proof . Challenger’s first step is to build the file and some puzzles and put them on Server. The second step is to generate information about a Challenger, which requires some data; The third step includes the storage node to complete the Proof and sends it back to Challenger. the fourth step is that Challenger uses some of the information left by itself to generate a Verify.This is to complete the validation with two roles and four steps.
Filecoin strangely changed the four steps into three, eliminating the need to generate Chal. Because the author of the Filecoin white paper clearly knows that there is no role in his system that can logically perform the task of generating Chal. So from the beginning, Filecoin and his POST were an impossible task. In the Filecoin system, the role of Chal cannot be Client, and it is impossible to be a Chain Node at the same time because there was only one consensus algorithm POW, and all computing power of Chain Node can only be used to compute hash value.
Recall that Bitcoin is two equivalent books. The bitcoin scenario is that the bitcoin books of any two machines must be the same. They only need to construct a Merkle Tree to verify the two nodes. In fact, Sia and Storj are the same. The network disk has files and the cloud also has files. In the case where both parties have original files, we can easily construct a certificate. But if it is IPFS, Lambda, and Sharding by V God, how can one complete the proof if one party has data and one party has no data? This is a more common problem. How can we complete the integrity of the data on non-trusted storage without the original file? This is actually an academic problem of the computer. Just as we know that this field is academically Solution.
Since 2007, two algorithms have been proposed, One is the persistence Proofs POR ( POR, Proofs of Data Possession), and the other is the integrity Proofs PDP ( PDP, Provable Data Possession), which is called the integrity of Data. V-god’s data recoverable proof is actually what the academic circles call POR. The principle is simple: first make a (t, n) threshold of the secret share of the data, The so — called tn threshold is the secret you share to n individuals, only t individiual have this data and the data can be restored, this is tn threshold secret sharing, we can realize the data recovery through secret sharing. Because tn threshold secret sharing is an encryption mechanism, a secret message can be inserted, and Challenger request for the message will tell if the data is still there. So it achieves the recoverability on the one hand and the test of possession on the other, this is POR.
PDP does not need to achieve recoverability, it needs to do things very simple, as long as you can see the data is still available. This scheme uses a label method. After the file is split, each block has a label, and the label is placed on the server side. The label can be mathematically operated between the labels. so when the data is requested on the Challenger, the label is also requested. and the service side does the operations on the label,and feedback’s the mathematical results to Challenger for a check
Where is the biggest problem? Who is going to be Challenger is a problem that IPFS does not think about because in this scenario, Client can’t do Challenger because it doesn’t hold data, so Blockchain’s Node can’t do Challenger, because the Note itself is a low-speed chain, and the consensus algorithm is POW. So the IPFS is in a bind. “Prove yourself and test yourself” is a very difficult process to reach a consensus in itself.
I believe that with the continuous development of BlockChain. PDP and POR have developed into so-called trusted third party credential roles. Since third-party trusted verification is academically possible, we can replace the so-called trusted third party through the chain consensus, and also replace the third party to verify through the consensus of the chain and the consensus of the semi-trusted nodes. Putting the results on the chain, and to solve the problem that the original trusted third party cannot solve the problem that the data will not be tampered with, this mechanism is probably the case.
Written in the back
In fact, the challenge for IPFS is that it is similar to AWS S3 when it is a Storage. At this time, it is impossible for the client to hold the original data for verification. Therefore, Filecoin cannot use the same data integrity as Sia and Storj. The biggest problem with Filecoin is that this proof of POST is impossible to implement. In essence, it is equivalent to Sharding’s proof of the availability of the data.
In this Conversation, Vitalik actually gave a general idea of what he thought of the problem. Its essence is similar to Lambda’s idea, but Vitalik is to solve the problem of the recoverability of account data, we are to solve the problem of recoverability plus holding of data. In other words, the Vitalik’s solution is similar to the POR in the field of computer science, but it is worth mentioning that the Vitalik does not mention the integrity of the data (PDP), and from this point of view, Lambda is one step ahead.
In fact, Vitalik Buterin’s POR algorithm is not what he thought of itself, it is that there is research in this field, but the current research progress is just right for the implementation for blockchain technology, which is the origin of Casper consensus algorithm. Lambda’s idea is that, in the case of non-Blockchain, in another verification algorithm that is different from POR, a PDP algorithm can be constructed, assuming a trusted third party to verify the data possession with a certain probability. And store the verification results in an explicit and non-tamperable manner. Then this credible third-party audit, also known as TPA, must be able to pass the verification results through the chain to achieve non-tamperable, and a single-point trusted verification process must also pass a set of semi-trusted Validator nodes. Consensus to complete.
So if you look at Vitalik Buterin’s answer, there are several key terms, namely Validator, Random, Sampling. and through this method, proof of data holding and recoverability can be achieved. In general, Vitalik Buterin discussed only at the POR level, while Lambda is thinking deeply in both POR and PDP dimensions.
✅ Official Website and Social links.
🔹 Official website: http://www.lambda.im
🔹Telegram Community : https://t.me/HelloLambda
🔹White Paper : http://www.lambda.im/doc/Lambda-WhitePaper-en.pdf