By using SQLancer, CnosDB technical team not only finds and fixes logical errors in product queries, and helps upstream projects build more stable query engines. CnosDB now becomes the first cloud-native time-series database that supports SQLancer.
CnosDB Becomes the First Cloud-native TSDB that Supports SQLancer
1.CnosDB has supported SQLancer
Adequate and comprehensive testing is the key to guaranteeing the correctness of a database management system. Finding logical errors is an important part of building a reliable database. CnosDB is concerned with both of the extreme release of performance as well as the correctness and repeatability of results.
SQLancer, full name Synthesized Query Lancer, is an automated security testing tool for database management systems (DBMS). The tool helps a wide range of researchers to easily identify logical vulnerabilities in application implementations. Its job is to highlight inconsistent or illogical results returned by the database system. It is written in Java and already supports several databases, including SQLite, MySQL, PostgreSQL, ClickHouse, etc.
SQLancer introduction, screenshot source SQLancer GitHub (github.com/sqlancer/sqlancer)
Recently CnosDB team and its open source community has implemented two test scenarios for CnosDB: TLP and NoREC, usingSQLancer. They have been also merged into the upstream main branch, becoming the first cloud native time-series database supporting SQLancer.
2.CnosDB implementation of SQLancer
The CnosDB implementation of SQLancer currently uses Restful API and generates SQL expressions based on the data types provided by CnosDB, which can generate SQL for aggregated queries and correlated queries. Expressions include arithmetic operations and most of the supported functions. In the next section, we will explain the execution steps of SQLancer and the specific detection methods of SQLancer.
2.1 Execution Steps of SQLancer
The main execution steps of SQLancer have the following five stages:
- Connect to CnosDB through the connector.
- Create a database in CnosDB.
- Create several tables in CnosDB.
- Insert data into the CnosDB tables. The inserted data are constants, often some boundary data. For example, the largest integers, negative numbers, and various strange UTF8 strings.
- Execute the selected detection method.
2.2 CnosDB Supports SQLancer's Detection Methods
At this stage, CnosDB supports two methods–TLP detection and NoREC detection.
2.2.1 TLP Detection
TLP stands for Ternary Logic Partitioning. It is a technique used in digital circuit design in which signals can be assigned to one of three states: 1, 0, or undefined (X). This technique can be used to represent uncertainty or incompleteness, and can help reduce the complexity of circuit implementation. In the case of database queries, each qurey is divided into three queries that are executed with TRUE, FALSE, and NULL results, and then the results of these three queries are combined and guaranteed to be TRUE. in this way, the results are then compared with the original results to find out if they are inconsistent.
The steps of TLP are as follows.
- Randomly select several tables from the database; in the selected tables, randomly select several columns.
- Generate a basic SQL statement without WHERE clause and generate WHERE clause.
- Add IS TRUE, IS FALSE, IS NULL predicates to the basic SQL statement according to the generated WHERE clause to form three SQL statements.
- Combine the three SQL statements into one by UNION.
- Execute the basic SQL statement and the SQL statement after UNION.
- Compare the result rows. If the number of rows is different, there is an exception.
2.2.2 NoREC Detection
NoREC stands for Non-Optimizing Reference Engine Construction, which is a method of building an engine without optimization. The purpose of this approach is to provide a simple, optimization-free reference implementation for comparison among other optimized engines and algorithms. This approach usually does not consider performance and efficiency, and is just to prove the concept. NoREC detection takes an optimized query, forces it into a non-optimized way, and then compares the query results. If the two executions do not match, there is a bug.
The steps of NoREC are as follows.
- Generate a SQL in the format of SELCT COUNT(*) FROM table WHERE expr.
- Generate a SELECT SUM (cnt) FROM (SELECT CAST (expr AS BIGINT) cnt FROM table)
- After execution, compare the result values. If the values are not the same, there is an exception.
But because CnosDB is a temporal database, the optimization rule for Count (*) will only scan the first Field column. So instead of generating SQL with COUNT (*), we COUNT the first FIELD.
3.Results From SQLancer
By implementing SQLancer, CnosDB technical team and community partners found and dealt with 15 potential logical error bugs, further guaranteeing the correctness of CnosDB execution. Here are the bugs, both from CnosDB itself and from DataFusion, the query engine. the bugs found as of February 1, 2023 are as follows.
- https://github.com/cnosdb/cnosdb/issues/852
- https://github.com/cnosdb/cnosdb/issues/784https://github.com/apache/arrow-datafusion/issues/4401
- https://github.com/cnosdb/cnosdb/issues/830
- https://github.com/apache/arrow-datafusion/issues/4843(This bug has alrealy been fixed by datafusion before the detection)
- https://github.com/apache/arrow-datafusion/issues/4947
- https://github.com/apache/arrow-datafusion/issues/3778
- https://github.com/apache/arrow-datafusion/issues/4075
- https://github.com/cnosdb/cnosdb/issues/782
- https://github.com/cnosdb/cnosdb/issues/807
- https://github.com/apache/arrow-datafusion/issues/4339
- https://github.com/apache/arrow-datafusion/issues/4297
- https://github.com/apache/arrow-datafusion/issues/4080
- https://github.com/apache/arrow-datafusion/issues/3832
- https://github.com/apache/arrow-datafusion/issues/3830
- https://github.com/apache/arrow-datafusion/issues/4452
4.Reasons For not Supporting PQS
In addition to the introduced TPL and NoREC, SQLancer also provides detection methods for PQS (Pivoted Query Synthesis), which generates new queries (usually SQL queries) in a logical way to solve a specific problem. By transforming and synthesizing existing queries, it finds a better query and evaluates its quality using a specific evaluation function.
The main process of PQS is as follows.
Generate a random table and insert data into it.
- Randomly select a row of data from the database.
- Construct a random expression based on that row of data.
- execute the expression and adjust it to TRUE if the result is not TRUE.
- Put the expression into WHERE or JOIN.
- Execute the query statement.
- Check if the latest result still contains the previous row of data. If not, there is a problem.
Currently CnosDB does not support PQS for the following reasons.
- Due to the storage engine characteristics of CnosDB's temporal database, NULL values are not currently stored, and there is no corresponding NULL flag.
- When reading a row of data, if the column does not exist any value, it is supplemented with NULL values to meet the needs of the query engine.
- Tag of CnosDB can be regarded as a special index, unlike Field storage, the same TagSet (i.e. a set of Tag key-value pairs) does not need to be stored in each row.
- For CnosDB, data rows with the same timestamp and TagSet are merged and de-duplicated.
- CnosDB currently does not support cross joins between tables.
Because of the above reasons, supporting PQS becomes insignificant in detecting errors. However, through continuous iteration and upgrade of the system, CnosDB development team and the community will also re-evaluate the possibility of supporting PQS.
5.Conclusion
The above is all the contents of CnosDB supporting SQLancer. If you are passionate about database and database testing, you are more than welcomed to pay continuous attention to our WeChat articles, Bilibili account and join our community, where we constantly updates recent technical news and technical insider. Looking forward to talking to you all in depth there!
Author
Harbour, a passionate entrepreneurial coder who obtained his B.S. degree in Computer Science and Advertising at HIT and Ph.D. in Computer Compilation at CMU, is now diving into the field of time-series database and database education.
References
- Finding Bugs in Database Systems via Query Partitioning, https://www.manuelrigger.at/preprints/TLP.pdf
- Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction, https://www.manuelrigger.at/preprints/NoREC.pdf
- Testing Database Engines via Pivoted Query Synthesis, https://arxiv.org/abs/2001.04174
- SQLancer Github, https://github.com/sqlancer/sqlancer