IceGiant WhitePaper
 

IceGiant:The Decentralized NoSQL Protocol

image

Abstract

Decentralized data storage networks, such as Filecoin, Arweave and Storj, have achieved a new round of decentralized data development infrastructure by creating tamper-proof and secure storage functions. IceGiant extends the functions provided by these data storage networks through NoSQL database synchronization protocol. IceGiant protocol supports rapid decentralized development, and enriches the data expression layer of application systems by using richer NoSQL data structures (such as hash, list, set) and data instructions, so as to improve the usability of web3/web2 system for decentralized networks and data.

1. Introduce

1.1 Motivation

For the web3 field, there is currently no way to write a complete client application as easily as in web2, and make it completely decentralized. In web2, you start a database on AWS and let your client application call the database for reading and writing. But there is nothing like it in web3. You can’t just write data to Ethereum, which is too expensive for most users. Storage protocols such as Filecoin and Arweave are mainly used to archive data, but they do not provide enterprise-level performance guarantee for writing and reading data.

In this article, we introduce IceGiant, which is a novel solution that can realize highly scalable and decentralized NoSQL database. Using RESP as the main interactive protocol, IceGiant supports rapid, complex and decentralized development, while eliminating the need for developers to learn any new skills and languages.

Using the advantages of IceGiant can be divided into four different categories:

  1. Greatly shorten the development time.
  2. Enhance the functions and customization of web3/web2 applications
  3. Fast data query speed and loading time
  4. Enhance the simplicity of data combination and complex data usage scenarios.

Firstly, the implementation of distributed NoSQL database enables almost every developer to build on the decentralized architecture. RESP, as a popular communication protocol of Redis database in the world, has now become the main communication protocol of NoSQL database. Redis, as the NoSQL database with the first market share in web2 environment, is widely used by developers all over the world. By being compatible with RESP communication protocol, IceGiant greatly reduces the development time and technical threshold, thus promoting the development of decentralized applications.

IceGiant also supports the development of some functions that were previously impossible in web3. By providing developers with a full-featured NoSQL engine, IceGiant allows developers to perform complex queries on massive data sets within milliseconds. This level of optimization can further push web3 applications into a data-driven state without sacrificing the decentralized nature of its products.

The key to the large-scale adoption of web3 technology is that it can provide the end users with the user experience they have become accustomed to in web2. The loading time of web pages directly affects the conversion rate of business. Wal-Mart found that the conversion rate will increase by 2% every time the page loading time is reduced by one second. The gateway loading time of web3 distributed storage still needs to be optimized. The core goal of IceGiant is to provide users with a fast and rich data structure NoSQL data set system.

Finally, IceGiant provides richer data expression and data combination functions. For the current dilemma that web3 can only read and write in KV, we provide strings\hashes\lists\sorted sets\sets data structure to achieve unprecedented interoperability at application data level. IceGiant supports the conventional RESP communication protocol. Whether developers want to use application data sets or are interested in specific data segments, IceGiant provides a simple and efficient way to use data.

1.2 Glossary of Terms

Itemdescribe
NoSQLNon-relational databases, which store data in a format different from relational tables.
data setThe only identifiable data set in the database, including the location and status of all past data.
Smart contractsA computer protocol designed to disseminate, verify or execute contracts in an information way.
FVMSmart Contract Virtual Machine of Filecoin
IPFS-LOGIPFS-log is an immutable, operation-based collision-free replication data structure (CRDT) for distributed systems. It is an append-only log that can be used to model the variable sharing state between peers in p2p applications.
validatorProvide data verification for the data on the chain, and maintain a complete database state.

2.System composition

2.1 Network structure

Although all IceGiant nodes are in the same P2P network, the structure of the network can be decomposed according to the data set, which is the data tenant isolation area, which refers to the database area used by a specific application on top of the IceGiant protocol. Different from the traditional blockchain network, IceGiant nodes are only responsible for interacting with peers operating the same data set, and are not responsible for any data of other data sets. Peer IceGiant node sets form a high-availability data storage area.

image

2.2 Storage mechanism

The NoSQL storage layer of each individual IceGiant mainly includes the codec layer and the underlying KV storage layer. the underlying KV engine currently supports levelDB, badgerDB, IPFS and OSS, and the main data storage includes two ways:

  1. instruction broadcast model based on IPFS-log
  2. Native data storage model based on IPFS

The specific storage model design will be further described in Section 3.3 NoSQL Storage Engine. Multiple IceGiant nodes will be divided into groups according to data sets, and each group will form a highly available storage area structure.

2.3 Smart contracts

IceGiant mainly uses smart contracts to build a decentralized storage engine strategy center. Our goal is to build a database platform that can run by itself and be governed by the public. At present, the running carrier of smart contracts mainly considers EVM-compatible virtual machine environment. First, we choose FVM as our decentralized management platform. FVM, as the computing layer of FileCoin storage ecology, allows us to make trusted calculations and provide services closer to data storage, and provides users with more reliable data operation credibility.

Intelligent contract functions mainly include:

  1. Data set and multi-tenant decentralized trusted management
  2. User account and access credentials management
  3. The NoSQL database is written into the contract storage of instruction set for data trusted synchronization and data consistency verification.

2.4 Query structure

In each query submitted to a node, there are the following pieces of different data:

  1. query: Standard RESP protocol query
  2. timestamp
  3. query hash
  4. query id
  5. RSA signature

Every time data is written, the node ensures that the RSA signature is valid for the associated key of the fund pool, and the public key is stored in the smart contract. After verifying the signature, the node checks the timestamp to ensure that it falls within the valid time range of the current block.

Once the timestamp is accepted, the node checks the uniqueness of the query ID. The query ID is guaranteed by the node that it can’t be duplicated within the batch within the current time range, and it can take random 64 characters associated with the query. If the node is a leader, it is also necessary to maintain the current state of the data set in the database.

3.Node Architecture

3.1 Reverse proxy

The reverse proxy server is the entry point for all requests sent to the node. On the IceGiant node, the reverse proxy server has four main purposes:

  1. Manage TLS/SSL certificates to ensure encrypted communication with any client/peer node.
  2. Implement CORS (cross-source resource sharing) configuration, and specify which sources (client and peer) to accept data read/write from.
  3. Route incoming requests to the correct gateway or synchronizer, and prevent bypassing any advanced request gateway.
  4. Provide static resources from the local file system.

3.2 IceGiant synchronize

IceGiant Synchronizer is an application directly above the database engine. All incoming database requests pass through IceGiant Synchronizer, which determines whether the requests should be processed, whether data writes should be propagated to other parts of the network, and whether local data should be written and customer requests should be responded to.

IceGiant Synchronizer can also provide data write aggregation function, allowing multiple data requests to be merged and written into a single network storage request. It also allows users to cross-mix data sets between different nodes, encouraging further data decentralization, while keeping the operation overhead low.

3.3 NoSQL storage engine

The core of each node is the database engine. By default, IceGiant node integrates KV storage engines such as levelDB, badgerDB, IPFS, OSS, etc., and implements the protocol coding layer of NoSQL on the KV storage relationship. Currently, the data storage of NoSQL mainly includes the following two ways:

The first implementation: instruction broadcast model based on ipfs-log: Based on ipfs-log,crdt and libp2p(pubsub), an immutable and operation-based conflict-free replication data model for distributed systems is implemented. Based on ipfs-log, various data structures such as event and kv are encapsulated, and multi-node database instruction broadcast is implemented based on this engine;At that bottom of IceFireDB, we abstract the variable kv engine base on badgerdb and leveldb. any node will broadcast the whole network when it is writing instruction, and the bottom driver of IceFireDB of each node will execute the broadcast instruction to ensure the final consistency of data.

           Log A                Log B
             |                    |
     logA.append("one")   logB.append("hello")
             |                    |
             v                    v
          +-----+             +-------+
          |"one"|             |"hello"|
          +-----+             +-------+
             |                    |
     logA.append("two")   logB.append("world")
             |                    |
             v                    v
       +-----------+       +---------------+
       |"one","two"|       |"hello","world"|
       +-----------+       +---------------+
             |                    |
             |                    |
       logA.join(logB) <----------+
             |
             v
+---------------------------+
|"one","hello","two","world"|
+---------------------------+

The second implementation: full storage model based on ipfs: In addition to the first implementation mode, we are also building the structure of the second type of data, so that the complete data will grow on ipfs. At first, there is an ipfs driver in the IceFireDB driver layer, which will encode and process the upper-level commands into a unified kv data structure, store and change the value, and the generated new cid will be connected with key. However, at present, there is no key broadcast network with multiple nodes and data synchronization. When we connect with the broadcast network, we can build a data model originally grown on ipfs.

+-+---------------+----+---------------+----+---------------+-+
|                           Transport                         |
| +---------------+    +---------------+    +---------------+ |
| |    Cluster    |    |    Cluster    |    |    Cluster    | |
| | communication |    | communication |    | communication | |
| +---------------+    +---------------+    +---------------+ |
+-+---------------+----+-------^-------+----+---------------+-+
                               |
+------------------------------v------------------------------+
|                        Query Processor                      |
|   +-----------------------------------------------------+   |
|   |                     Query Parser                    |   |
|   +-----------------------------------------------------+   |
|   +-----------------------------------------------------+   |
|   |                   Query Optimizer                   |   |
|   +-----------------------------------------------------+   |
+---+--------------------------+--------------------------+---+
                               |
+------------------------------v------------------------------+
|                            Codec                            |
|    +-----------+                           +-----------+    |
|    |   Encode  |                           |   Decode  |    |
|    +-----------+                           +-----------+    |
|                 support: kvlisthashset                 |
+----------+---------------------------------------^----------+
           |                                       |
+----------+---------------------------------------+----------+
|          |put              KV Engine             |Get       |
|    +-----v----+                            +-----+----+     |
|    | put(a,b) |                            |  Get(a)  |     |
|    +-----+----+                            +-----+----+     |
|          | a:b            +-------+              | a        |
|    +-----v----+    +------> store <----+   +-----v----+     |
|    |  CID(b)  +----+      +-------+    +---+ cat(hash)|     |
|    +-----+----+                            +-----+----+     |
|          | add(b)                                | cat      |
|  --------v---------------------------------------v-----     |
|                          IPFS nodes                         |
+-------------------------------------------------------------+

Each IceGiant node is a perfect NoSQL database engine, and each database engine is configured to allow access only through the IceGiant synchronizer running on the same machine. There are the following considerations in doing so:

  1. Restrict synchronizer’s access to database, and reduce the chance of user’s error. Non-propagating data writes must be clearly identified by IceGiant synchronizer, rather than accidentally submitted to NoSQL database storage engine, which makes similar nodes in the network contain the same data as much as possible.
  2. Restrict access to IceGiant synchronizer through reverse proxy or other advanced gateways, so that database users can reduce attack vectors, thus reducing the vicious situation of network abuse by bots, data/privacy leakage and cross-node status differences.

3.4 Advanced request handler

IceGiant node can choose to be equipped with an advanced request processing gateway, the core of which is an application middleware, which can contain complex business logic to define database interaction. This can develop the function of IceGiant node from database/data storage system to a fully functional decentralized API system. Examples of high-level request gateways that can be built include: fine-grained database access, complex aggregate queries and concurrent queries, and rules of which data to pass to IceGiant synchronizer for propagation.

For example, we can build a multi-tenant management page to run on an advanced request gateway, which manages application-specific logic. The advantage of such a system is that it only needs one gateway to verify access rights.

4. Smart contracts

4.1 Transaction Structure

IceGiant blocks are stored in JSON format when submitted to smart contracts. Because each IceGiant node can serve multiple data sets, a single smart contract transaction can contain multiple blocks.

In each JSON, the top key is the name of the data set referenced by the data. Below the key in the first layer is the request endpoint that receives data writes. After the data is divided by the request endpoint, all data writes are sorted in chronological order.

4.2 Contract interaction

The main functions of data set existence, ownership and verification come from intelligent contracts, which are mainly used to construct data set isolation, data reading and writing logs, fund pool, data legality and consistency verification.

5.Index: tracking block metadata

5.1 $IFDB token

The native token of IceGiant and IceFireDB ecosystem is $IFDB token, which has the following purposes:

  1. The main purpose of the token is to let interested parties verify the incoming blocks of the database. This can occur in the form of stakeholders running peer verifier nodes, or by entrusting benefits to verifiers.
  2. Token mechanism prevents network spam requests. As the name of the data set is unique, registering the data set consumes a small amount of tokens, creating a small entry barrier, which can be ignored by well-meaning users.

5.2 Sales Verification Node Program

The master validator in the IceGiant validator pool is the node with the highest stake/delegated stake. This node can be managed by individuals/groups closely related to the project it serves, but in theory it can be run by any third party.

The lead validator is the default node to which any queries are sent and is responsible for serving all application data (this is the default, but not mandatory). In addition to tracking appropriate state changes, the master validator also runs a database that runs the active state of the dataset. The benefit of making this the default is that it significantly reduces the overall cost of maintaining the dataset for nodes. Projects can choose to create more “expensive” datasets and thus benefit from further decentralization, but in most use cases any further decentralization will result in diminishing marginal utility.

Any stake owned or delegated to the master validator cannot be taken away until the validator’s most recent block is complete. Once a block is mined by the new lead validator, the lead validator can remove their stake from the pool. The lead validator can request to unmark when mining its final block, preventing the token from being locked into its role.

5.3 Peer verifier

Any validator that is not a leader is a peer validator, and by default it is the role of a peer validator to track incoming data writes and ensure that these writes are properly stored on IPFS by the master validator.

Peer validators may also choose to maintain an active state of the database, allowing them to be queried just like the master validator. Some network users may want to do this as it acts as a mechanism to eliminate the risk of network downtime. Additionally, the highest-level validators may have a vested interest in maintaining the active state of the database as they become the next primary validator at any time.

6.Funding pool: multi-chain support

6.1 EVM multi-chain fund pool

Funding pools enable individuals and communities to pay for data usage on any number of blockchains. The fund pool is just a smart contract that holds funds for the validator and can be used to incentivize the validator to act correctly, and the tokens in the fund pool are used to pay the operator’s storage, computing and hardware costs.

A dataset can have any number of pools. When creating a pool, the creator can specify the chain, token, and validator that the pool is bound to. This allows users to pay into the IceGiant network with any ERC20 equivalent token. By allowing payments in any token, IceGiant creates a chain-agnostic NoSQL database that utilizes smart contracts as a state tracking layer.

7. Data DAO

IceGiant’s core goal is to build a web3 Data Dao. We use database and intelligent contract technology to bring the database retrieval capability into web3. The database platform is not controlled by the central company, so that users can control their own data permissions and reading and writing processes. At present, I mainly support NoSQL protocol, and in the future, I will support SQL and time series database protocol to help web3 build decentralized data DAO.

8. Related material

  1. Deterministic Databases and the Future of Data Sharing

    https://thenewstack.io/deterministic-databases-and-the-future-of-data-sharing

  2. How website performance affects conversion rates

    https://www.cloudflare.com/learning/performance/more/website-performance-conversion-rates

  3. Relational to NoSQL at Enterprise Scale: Lessons from Amazon

    https://thenewstack.io/relational-to-nosql-at-enterprise-scale-lessons-from-amazon