How to Deduplicate Rows in SQL: Complete Guide with Examples
Generate a deduplication query for your table.
Open Deduplicate Query Builder ↗Types of Duplicates
Before writing a deduplication query, you need to decide what counts as a duplicate for your use case:
- Exact duplicates — every column, including the primary key, is identical (rare — usually a bulk insert error)
- Business key duplicates — same combination of meaningful columns (email, order number, customer ID + date) but different surrogate keys or timestamps
- Soft duplicates — rows that represent the same real-world entity but differ in minor details (trailing spaces, case differences, slightly different timestamps)
Find Duplicates with ROW_NUMBER()
The most powerful pattern uses ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) to number each row within each group of duplicates. Row number 1 is the one to keep; rows 2+ are duplicates:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email -- columns that define a duplicate
ORDER BY created_at DESC -- keep the most recent
) AS rn
FROM customers
)
SELECT * FROM ranked WHERE rn = 1; -- keep only first
Find Duplicates with GROUP BY
A simpler approach for just identifying which values have duplicates, without needing to pick a winner:
-- Find emails that appear more than once
SELECT email, COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
Using the Generator
The table you want to deduplicate.
The columns that define what makes two rows duplicates — e.g. email, or customer_id + order_date.
Keep First (oldest row by created_at), Keep Last (most recent), or Keep by ID (highest or lowest primary key).
SELECT to preview duplicates, DELETE to remove them, or CTE to get a deduplicated result set without modifying the table.
The generator produces a ready-to-run query for your chosen dialect.
Deleting Duplicates
Once you've identified duplicates with ROW_NUMBER(), delete the rows where rn > 1:
-- SQL Server: delete duplicates keeping most recent
WITH ranked AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM customers
)
DELETE FROM ranked WHERE rn > 1;
Always test with SELECT first. Run the SELECT version of the deduplication query and verify the rows marked for deletion before running DELETE. Back up your table first if the data is critical.
Dialect Differences
SQL Server allows DELETE directly on a CTE, making deduplication very clean. PostgreSQL and MySQL require a subquery approach because you can't delete directly from a CTE in those databases:
-- PostgreSQL: delete via subquery
DELETE FROM customers
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM customers
) t
WHERE rn > 1
);
