Are your documents really saved in chains? 「Lambda」intends to provide a trusted storage solution based on validator node consensus

Editor’s note: this article is a repost from “Odaily” (official account ID: o-daily, Download App), a blockchain media outlet in strategic cooperation with 36 Kr

Although data in chains cannot be tampered with or backdated, the unknown storage terminals are untrusted by users who save their documents there in the decentralized storage environment. In this case, how is it possible to prevent malicious behaviors of untrusted nodes? In other words, how can we be sure that distributed storage terminals have actually accomplished their storage tasks?

In this regard, the Provable Data Possession (PDP) scheme for storage projects is particularly important. According to the scheme, users send data to the miners for storage, and the miners prove that they have stored the data. Users can double-check whether the miners are still storing their data.

Of course, there are various decentralized storage schemes on the market at present. In terms of PDP, we know that IPFS, Sia, Storj and other decentralized storage schemes all want to ensure the reliability of storage terminals by validating their PDP in an untrusted environment.

He Xiaoyang, founder of Lambda, a decentralized storage project, told Odaily that the existing two kinds of storage schemes all have their own disadvantages:

  1. IPFS/Filecoin, for example, provides only static file storage and cannot search file addresses through file content. Meanwhile, as to the PDP verification of storage terminals, they want the public to initiate the verification, but specify no specific initiator, nor how to initiate; that means it’s unclear who the data validator is.
  2. l The storage schemes represented by Sia and Storj intend to challenge storage terminals regularly through smart contracts, requiring them to provide feedback on the verification information of file fragments, hash values, etc. However, due to the limitation of chain ledger sizes, only the Root Hash of the Merkle Tree is stored in chains, which only ensures that the data cannot be tampered with, but cannot guarantee the possession and availability of the data.

Lambda plans to build a decentralized storage platform for enhancing the trust between users and storage terminals through the verification and consensus of distributed nodes. The Lambda platform sets up two network systems (like IPFS/Filecoin): the repository system and the block chain. The former is responsible for storage, while the latter is responsible for the access and control of the repository system. The two network systems are connected by subchains to realize interactions.

Unlike other decentralized storage projects, Lambda intends to provide storage terminals with PDP through the consensus generated by validator nodes, to ensure the integrity and recoverability of the data stored on untrusted storage terminals.

The approach provided by Lambda is to prove a storage terminal’s possession of file “F” in a Permissionless Store in the simplest scenario. The first step is to generate a collection of messages (m1, m2…, mn) of any file “F”, following the sharding of the original file that a user needs to store, and then calculate and cryptographically generate a number of tags based on the shards of the original file and four figures (i.e. the security parameter λ, or metadata).

The validator nodes store metadata for subsequent generation of challenges/puzzles; storage terminals store the original file shards and the corresponding tags which can help storage terminals accurately extract data from the original file shards and solve the puzzles.

The validator generates challenges/puzzles to challenge storage terminals, which then calculate based on the original file shards and tags in hand and generate a corresponding set of digital vectors, feeding them back to the validator. Finally, based on the metadata and vectors, the validator calculates and verifies whether the verification of storage terminals is successful. If it is, the transaction will be packed and embedded in the block chain; if not, the storage nodes will be punished.

When a single point possession of a file can be verified, the characteristics of the file such as dynamics, multiple copies, erasure codes, and deduplication can be supported by corresponding adjustments.

Taking such an approach, the validator needs to save the metadata, and storage terminals need to undertake extra tasks to store the tags, with the storage volume increasing only 1% to 3%. In the process of initiating verification against storage terminals, the validator’s challenge request will be made remotely because the two sides are in different network architectures. When a storage terminal does not accept the challenge or cannot be connected due to network problems, the validator can directly conclude that it fails to store data.

Throughout the system, the role of the validator is Lambda’s innovation point. Lambda also designed the roles of nominator and fisherman.

  1. Validator: The validator performs transaction packing and block generation across the entire Lambda network, therefore, it needs to pledge some Tokens. Instead of a single validator node, a random set of validator nodes are selected from hundreds of thousands of validator nodes to jointly verify a storage terminal, and the validator node is shifted for every 1024 blocks.
  2. Nominator: Nominators with capital recommend one or multiple validators to make decisions for them. They have no other function than capital investment.
  3. Fisherman: Fishermen are not correlated to the block packing process since their roles are similar to “bounty hunters” in the real world. Appealed by a one-time rich reward, the fishermen report the malicious behaviors of validator nodes through entrapment.

If data are maliciously deleted, it is especially necessary to restore the data file. In terms of the recoverability of data, the vision of Lambda is to distribute a file to B persons after the data file is divided into A parts (B < A). As long as the missing data are not greater than a certain preset value, they can be recovered. Since the preparation in 2017, the Lambda project has completed an angel round financing of tens of millions of RMB, with ZhenFund, Metropolis VC and Dfund as the lead-investors. According to He Xiaoyang, the test network of Lambda will go online early next year. Among the 15 team members of Lambda, the major members of the core founding team are from OneAPM, an APM SaaS company, and most team members are front-line programmers of basic software R&D and from open source communities.

Official website:
🔹Telegram Community :
🔹White Paper :

Oracle Cluster, Distributed Database and Block Chain

The block chain has become more popular and is being accepted by more and more people from the the society, and is gradually becoming a new trend and is changing the future. In this process, block chain technology has also made rapid development which includes smart contract, consensus mechanism, fragmentation and cross-chain and more to come.

As a veteran who has struggled in database field for more than 10 years, the essence of block chain is actually a multi-active distributed database. Many technologies in the block chain have also been used in many applications during the long evolution of the database. In practice, in order to better understand the block chain, I outlined the evolution of the mainstream system architecture over the past decade or so since 2000. We will be divided into three stages to elaborate:

Inthe first phase, the era of Oracle Cluster.

If you want to choose a mature database solution that can support tens of thousands of transactions per second. Engineers will definitely recommend Oracle. The main structures of the system during this period are as follows:

The year 2001 was a special year for Oracle Corporation, in which Oracle released the classic Oracle 9i version of the Oracle database product. In this version of the product, Oracle greatly enhanced the Real Application Cluster (RAC), enabling Oracle to complete the process of converting from a single host node to multiple nodes in a production environment.

The RAC technology effectively solves the bottleneck of the single-node operation of the database in the past, improves the concurrent load capacity and high availability of the database system, and satisfies the urgent need to manage the exponential data growth in the enterprise development process. Therefore, this program has been widely applied to various fields such as operators, banks, securities, etc., of course, including Internet giants such as Alibaba and JD.

With the upgrade of the version, the peak value of Oracle 10G cluster node theory has been expanded to more than 100. But for reasons such as high cost of deploying devices (minicomputers and storage), it is impossible for enterprises to accomplish this level of deployment. From the referenced data its understood that some domestic enterprises have applied clusters of 6–8 nodes in actual business.

Despite the wide range of applications, with the further development of the domestic Internet business, some problems of the Oracle cluster have gradually emerged, which also prompted the emergence of a vigorous “go to IOE” movement in the China.

The first problem: the hardware and software (IOE) required for the architecture is expensive.

As you can see in the schema diagram in Figure 1, Oracle uses a way to share storage among multiple nodes in order to ensure the consistency of transactions, That is to say, the final data is stored on the disk only once, the data can be read and write by multiple nodes. Oracle uses row locks and other technologies to solve the problem of read and write contention, and guarantee the consistency of each transaction request submission or rollback.

The application layer host is routed to different database nodes through Oracle Net Services. After submitting the database transaction request, the database node periodically writes the transaction data to the shared disk array.

We know that IO requests are very slow in computer requests, and this architectural design accentuates the performance bottleneck of IO reading and writing. Enterprises responded by buying better, more expensive disk arrays and improving IO performance, which led to the IOE architecture occupying the mainstream position of domestic IT procurement for a long time.

We know that IO requests are very slow in computer requests. This architecture design highlights the performance bottleneck of IO reading and writing. The corporate response is to purchase better and more expensive disk arrays and improve IO performance. This has also led to the IOE architecture occupying the mainstream of domestic IT procurement for a long period of time.

The second problem: limited scalability, increased node performance gains and nonlinear enhancement.

As mentioned above, it is limited by the performance bottleneck of IO requests. The transaction data of each node initiates an IO request to write data to disk at a certain interval. In this process, real-time transaction information is synchronized by the IPC protocol to transfer memory data between nodes. At this point we need a high-speed data transmission network, usually done by Gigabit NIC.

However, as the number of nodes continues to increase, the more transaction data needs to be synchronized and the network contention and row lock contention will become more serious, resulting in the performance gain and nonlinear enhancement of the extended nodes.

The second stage: the era of distributed databases.

With the rapid development of Taobao business, Taobao has become one of Oracle’s top clients in China since 2010. In order to reduce IT costs while meeting the further development needs of the business and given the long-term accumulation of technology driven in the open source community. In 2010, the Taobao technical team launched a vigorous “Go IOE” campaign with the goal of replacing the traditional Oracle DB architecture with a distributed open source database architecture. The success of Taobao’s “GO IOE” effectively verifies the load capacity of distributed database in large-scale production applications, which also led to the transformation of domestic IT architecture selection ideas and the evolution of distributed database architecture.

There are many kinds of distributed database schemes, which are commonly used in various Internet companies. The essence is to split the data into different nodes, thus alleviating the pressure of single-node reading and writing, such as splitting data by parity. It is a simple distributed system, the schematic is as follows:

When we split the data into distributed databases, we have to consider the trade-offs between the three features of distributed, as described by the CAP:

The system architect needs to optimize the three features of CAP according to the business, so that the result of the system can be consistent with the expectation of the business requirements.

Some of the mainstream distributed schemes seen so far have varying degrees of choice for each of the three features. When we were running our last startup project, OneAPM, we used Kafka, Zookeeper, Clickhouse and other architectures to meet the business needs in order to solve the huge amount of data generated by processing user access trajectories. The architecture diagram is as follows:

The third stage: Block chain era

From the perspective of database technology, the essence of blockchain is a multi-active distributed database with a specific architecture. Paxos, RAFT and other algorithms are used in distributed database to guarantee the strong consistency of multiple copies of the database. Block chain is a consensus algorithm to solve the problem of data consistency among the nodes.

POW, POS, DPOS and so on all kinds of consensus algorithms are essentially in competition which node will have the right to write to the block ( account data). In the design of the economic model of block chain, the behavior of writing block chain is encouraged economically, and the development of block chain ecology is greatly encouraged.

But it is precisely because each node is likely to have the right to write a block, and in order to guarantee the consistency of system data, a tradeoff must be made with the system’s response time. So we see that the TPS of bitcoin is 7:1 per second, and the TPS of Ethereum is only a few dozen services per second. The CAP problem in distributed databases is still a technical problem that the blockchain needs to challenge. Of course, with the emergence of new technologies such as fragmentation and lightning network, the bottleneck of transaction performance is continuously optimized.

In addition, from a product perspective, the current blockchain technology primarily provides the ability of transaction accounting, and can not support the standard ability of the database, so in terms of function, the current block chain is essentially a special subset of database technology stack In the blockchain, expanding the ability to add, delete, and modify files and data is a key capability for DAPP’s ecological development.

Looking at the three era’s of database system evolution, the wave of technological development cannot be stopped. All we can do is learn and face. Just as the distributed database replaces the traditional ORACLE cluster the decentralized model of blockchain technology will also iteratively centralize the model’s business system. From the current situation, the emergence of new technologies, such as fragmentation, lightning network, transaction performance bottleneck is constantly optimized, and the blockchain technology is gradually evolving to the mainstream IT architecture.

✅ Official Website and Social links.
🔹 Official website:
🔹Telegram Community :
🔹White Paper :