Conquering Inaccessible Databases? CnosDB to the Rescue! Dive into 100,000 Lines of Source Code with Ease!
Many members of our community have been curious about CnosDB and where to begin reading its source code. This question was discussed during a previous CnosDB HiTea livestream, and today we’ll revisit the topic.
The CnosDB source code can be primarily divided into two parts: Query Engine and Storage Engine. The Query Engine resides under the query_server directory and contains code related to query operations. On the other hand, the Storage Engine can be found in the tskv directory, focusing on time-series storage-related code.
Query Engine Source Code
Let’s start with an overview of the source code reading process for the Query Engine:
- Parsing SQL: The SQL statements are parsed into an ExtStatement structure, representing the syntax tree of the SQL. The ExtStatement structure can be found in query_server/spi/src/query/ast.rs. It can be categorized into DQL/DML (SELECT, INSERT) and DDL (CREATE, DROP) statements.
- Generating Logical Plans: Logical plans are generated based on the parsed SQL statements. The transformation code for this step can be found in query_server/query/src/sql/planner.rs. During logical plan generation, metadata is accessed to verify the correctness of the statements.
- Optimizing Logical Plans: Entry points for this step can be found in query_server/query/src/execution/query.rs and query_server/query/src/sql/optimizer.rs. Logical plan optimization is rule-based and includes rules like predicate pushdown and expression simplification. DDL statements skip steps 3, 4, and 5.
- Converting Logical Plans to Physical Plans: Different implementations, such as sort-merge join and hash join, are used during the conversion from logical plans to physical plans.
- Optimizing Physical Plans: Physical plan optimization is cost-based. For example, optimizing joins based on the data volume of tables. The entry point for this step is in query_server/query/src/sql/optimizer.rs.
- Executing Plans: DDL execution mainly involves accessing metadata and making modifications, which can be found in query_server/query/src/execution/ddl. DQL typically involves table scanning, predicate filtering, and operations like joins, projections, and aggregations. The fundamental step is scanning the table (TableScan), which generates an iterator over table data (RecordBatch). Operations are then performed on the RecordBatch, including filtering, joining, and aggregation, often executed in a vectorized manner.
Storage Engine Source Code
To understand the workings of TSKV, you can refer to tskv/src/engine.rs for the provided interfaces. Start by examining the write interface and its implementation in tskv/src/kvcore.rs. The write interface accepts a write_batch as input, which is a generated code from flatbuffers. The original fbs files and some grpc definitions can be found in common/protos/proto. From the write_batch, you can obtain information such as the database, table, and the actual data being written. Before writing to memory, the write_batch is used to generate a write_group, which can be seen in tskv/src/database.rs. Prior to writing to memory, the write-ahead log (WAL) is written to ensure data recovery. The WAL writes compressed points and generates sequence numbers, with the implementation found in tskv/src/wal.rs.
After completing the Write-Ahead Log (WAL), the data is written to memory. First, based on the parameters passed to the write interface, the specific TSFamily (Time Series Family) is obtained, which serves as the actual storage unit. The implementation logic for this is located in tskv/src/tseries_family.rs
. After the data is written to memory, it is checked against the configuration settings to determine if a flush should be initiated to write the data to disk. If the flush conditions are met, the flushing process begins. The implementation code for flushing can be found in tskv/src/compaction/flush.rs
.
After the flush operation, a summary edit request is sent. Summary edit is primarily used to mark which data has been flushed, allowing for the exclusion of writing this data during recovery or restart. The logic related to the summary can be found in tskv/src/summary.rs
. Compaction is responsible for merging data files on the disk. The specific compaction logic can be seen in tskv/src/compaction/compact.rs
.
Distributed Functionality
CnosDB’s distributed architecture consists of two types of nodes: Meta nodes and Data nodes. Meta nodes are responsible for storing cluster-related metadata, such as data distribution, Data node information, user permissions, DB (Database), Table information, and other relevant details. Meta nodes utilize the Raft consensus algorithm to implement a CP (Consistency and Partition Tolerance) storage system, ensuring high availability and strong consistency for metadata storage. On the other hand, Data nodes are responsible for data storage and querying. The current storage and querying functionalities are implemented within the Data nodes. In the future, there may be a separation of computation and storage, with storage functions residing in a separate process and stateless computation nodes in another process.
CnosDB employs a data sharding rule based on the specific characteristics of time-series data, using time-range-based sharding. Periodically, a virtual logical unit called a bucket is created. Each bucket consists of multiple Vnodes (virtual nodes) based on the number of Data nodes and replicas. Each Vnode represents an independent LSM (Log-Structured Merge) Tree and corresponds to the TSFamily structure, serving as a separate operational unit for storing data distributed across the Data nodes. To ensure effective data redundancy, the Meta node may consider factors such as racks, power sources, controllers, and physical locations when allocating Vnodes during bucket creation.
After completing the Write-Ahead Log (WAL), the data is written to memory. First, based on the parameters passed to the write interface, the specific TSFamily (Time Series Family) is obtained, which serves as the actual storage unit. The implementation logic for this is located in tskv/src/tseries_family.rs
. After the data is written to memory, it is checked against the configuration settings to determine if a flush should be initiated to write the data to disk. If the flush conditions are met, the flushing process begins. The implementation code for flushing can be found in tskv/src/compaction/flush.rs
.
After the flush operation, a summary edit request is sent. Summary edit is primarily used to mark which data has been flushed, allowing for the exclusion of writing this data during recovery or restart. The logic related to the summary can be found in tskv/src/summary.rs
. Compaction is responsible for merging data files on the disk. The specific compaction logic can be seen in tskv/src/compaction/compact.rs
.
Summary
This article mainly introduces the code structure of the three major modules in CnosDB: Query, Storage, and Distributed. More details of the code will be gradually explored in future chapters, so stay tuned.