timestamp_repair
– Effectively repairs the timestamp column, supporting operations such as insertion, deletion, and no change.value_repair
– Intelligently repairs the value column, optimizing based on timestamp intervals and speed changes.value_fill
– Provides various filling methods to quickly fill missing data in the value column.
These functions can effectively enhance the quality of time series data, ensuring the reliability of subsequent analysis results. This article will detail the usage methods of CnosDB's time series data repair functions.
Data Quality Issues
Time series data often faces various data quality issues. During the process of managing time series data, from data collection to final storage in a time series database, any sensor malfunction or network transmission error can lead to data quality problems. Analyzing such “dirty data” can produce misleading results. Therefore, after evaluating data quality, we need to effectively fill and repair the timestamp and value columns.
CnosDB's Time Series Data Repair Methods
In CnosDB, we have designed and implemented various data repair methods, allowing users to evaluate the quality of time series data from multiple dimensions.
timestamp_repair(time, value, 'method=mode&start_mode=linear')
Timestamp Repair (timestamp_repair
)
For the timestamp column, CnosDB uses the timestamp_repair
function for repair. This function first determines the timestamp interval using methods such as Mode, Cluster, and Median. Then, it determines the start value of the repaired timestamp using Linear and Mode methods. Next, it adopts a dynamic programming algorithm to optimize repair costs, supporting three operations: insertion, deletion, and no change, ultimately obtaining the optimal repair sequence.
Example:
timestamp_repair(time, value, 'method=mode&start_mode=linear')
Value Repair (value_repair
)
For the value column, CnosDB provides the value_repair
function for data repair. This function uses the Screen algorithm and LsGreedy algorithm. The former determines the repair window based on the median of the timestamp interval, while the latter calculates speed changes and uses a greedy strategy for repair.
Example:
value_fill(time, value, 'method=linear')
Value Fill (value_fill
)
In addition to value repair, CnosDB also provides the value_fill
function for data filling. This function determines the method of value filling based on the input parameter Method, supporting five filling methods: Mean, Previous, Linear, AR, and MA.
Example:
value_fill(time, value, 'method=linear')
Usage Example
Below is a specific usage example demonstrating how to use the three functions mentioned above to repair time series data:
Example 1: timestamp_repair
- Create dataset:
CREATE TABLE wzz (value DOUBLE);
INSERT INTO wzz VALUES ('2024-01-01T00:00:00.000', 1.0),('2024-01-01T00:00:10.000', 2.0),('2024-01-01T00:00:19.000', 3.0),('2024-01-01T00:00:30.000', 4.0),('2024-01-01T00:00:40.000', 5.0),('2024-01-01T00:00:50.000', 6.0),('2024-01-01T00:01:01.000', 7.0),('2024-01-01T00:01:11.000', 8.0),('2024-01-01T00:01:21.000', 9.0),('2024-01-01T00:01:31.000', 10.0);
-
Before Repair:
SELECT * FROM wzz;
-
After Repair:
SELECT timestamp_repair(time, value, 'method=mode&start_mode=linear') FROM wzz;
Example 2: value_fill
-
Create Table
CREATE table wzz(value double);
INSERT wzz VALUES ('2024-01-01T00:00:02',acos(3)),('2024-01-01T00:00:03',101.0),('2024-01-01T00:00:04',102.0),('2024-01-01T00:00:06',104.0),('2024-01-01T00:00:08',126.0),('2024-01-01T00:00:10',108.0),('2024-01-01T00:00:14',acos(3)),('2024-01-01T00:00:15',113.0),('2024-01-01T00:00:16',114.0),('2024-01-01T00:00:18',116.0),('2024-01-01T00:00:20',acos(3)),('2024-01-01T00:00:22',acos(3)),('2024-01-01T00:00:26',124.0),('2024-01-01T00:00:28',126.0),('2024-01-01T00:00:30',128.0);
-
Before Repair:
SELECT * FROM wzz;
-
After Repair:
SELECT value_fill(time, value, 'method=mean') FROM wzz;
Example 3: value_repair
-
Create Dataset
CREATE table wzz(value double);
INSERT wzz VALUES ('2024-01-01T00:00:02',100.0),('2024-01-01T00:00:03',101.0),('2024-01-01T00:00:04',102.0),('2024-01-01T00:00:06',104.0),('2024-01-01T00:00:08',126.0),('2024-01-01T00:00:10',108.0),('2024-01-01T00:00:14',112.0),('2024-01-01T00:00:15',113.0),('2024-01-01T00:00:16',114.0),('2024-01-01T00:00:18',116.0),('2024-01-01T00:00:20',118.0),('2024-01-01T00:00:22',100.0),('2024-01-01T00:00:26',124.0),('2024-01-01T00:00:28',126.0),('2024-01-01T00:00:30',acos(3));
-
Before Repair:
SELECT * FROM wzz;
-
After Repair:
SELECT value_repair(time, value, 'method=screen') FROM wzz;