Deduplication
Deduplication ensures that only one row exists for a given set of key columns. When a new row matches an existing row's keys, the old row is replaced.
When to use deduplication
Use deduplication when:
- You need idempotent writes (safe to retry or resend data)
- You're reloading data that may have changed (e.g., third-party data feeds)
- You want "last write wins" behavior for a given key
- You're recovering from ingestion errors and need to resend a time range
Skip deduplication when:
- Your timestamps are always unique (no duplicates possible)
- You're doing append-only logging where duplicates are acceptable
- Write performance is critical and you're certain duplicates won't occur
Quick example
CREATE TABLE prices (
ts TIMESTAMP,
ticker SYMBOL,
price DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(ts, ticker);
With this configuration, each (ts, ticker) combination can have only one row:
INSERT INTO prices VALUES ('2024-01-15T10:00:00', 'AAPL', 185.50);
INSERT INTO prices VALUES ('2024-01-15T10:00:00', 'AAPL', 186.00); -- replaces previous
SELECT * FROM prices;
| ts | ticker | price |
|---|---|---|
| 2024-01-15T10:00:00 | AAPL | 186.00 |
Only the last value is kept.
How it works
When deduplication is enabled, QuestDB:
- Checks if incoming rows match existing rows by UPSERT KEYS
- If keys match, compares the full row content
- If the row is identical, skips the write entirely (no disk I/O)
- If the row differs, replaces the old row with the new one
This full-row comparison significantly reduces write amplification when reloading large datasets where only a small portion has changed — common when consuming third-party data feeds that provide full snapshots.
Performance
Deduplication has minimal overhead when:
- Timestamps are mostly unique across rows
- Data arrives in roughly time-ordered fashion
Deduplication is more expensive when:
- Many rows share the same timestamp
- Deduplication keys have high cardinality
The full-row check optimization means that reloading unchanged data is cheap — QuestDB detects identical rows and skips unnecessary writes.
Configuration
Create table with deduplication
CREATE TABLE prices (
ts TIMESTAMP,
ticker SYMBOL,
price DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(ts, ticker);
The designated timestamp must always be included in UPSERT KEYS.
Enable on existing table
ALTER TABLE prices DEDUP ENABLE UPSERT KEYS(ts, ticker);
Disable deduplication
ALTER TABLE prices DEDUP DISABLE;
Change UPSERT KEYS
ALTER TABLE prices DEDUP ENABLE UPSERT KEYS(ts, ticker, exchange);
Checking configuration
Check if deduplication is enabled:
SELECT dedup FROM tables() WHERE table_name = 'prices';
Check which columns are UPSERT KEYS:
SELECT "column", upsertKey FROM table_columns('prices');
Requirements
- Deduplication requires WAL tables
- The designated timestamp must be included in UPSERT KEYS
- Enabling deduplication does not deduplicate existing data — only new inserts