Guide

How to Deduplicate Rows in SQL: Complete Guide with Examples

Bill Crawford — Guide — February 2026 — 10 min read · Last updated October 05, 2025

Contents

Types of duplicates
Find duplicates with ROW_NUMBER
Find duplicates with GROUP BY
Using the generator
Deleting duplicates
Dialect differences

Generate a deduplication query for your table.

Connect on LinkedIn →

Open Deduplicate Query Builder ↗

Types of Duplicates

Before writing a deduplication query, you need to decide what counts as a duplicate for your use case:

Exact duplicates — every column, including the primary key, is identical (rare — usually a bulk insert error)
Business key duplicates — same combination of meaningful columns (email, order number, customer ID + date) but different surrogate keys or timestamps
Soft duplicates — rows that represent the same real-world entity but differ in minor details (trailing spaces, case differences, slightly different timestamps)

Find Duplicates with ROW_NUMBER()

The most powerful pattern uses ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) to number each row within each group of duplicates. Row number 1 is the one to keep; rows 2+ are duplicates:

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY email          -- columns that define a duplicate
      ORDER BY created_at DESC    -- keep the most recent
    ) AS rn
  FROM customers
)
SELECT * FROM ranked WHERE rn = 1;  -- keep only first

Find Duplicates with GROUP BY

A simpler approach for just identifying which values have duplicates, without needing to pick a winner:

-- Find emails that appear more than once
SELECT email, COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

Using the Generator

Enter your table name

The table you want to deduplicate.

Enter the partition columns

The columns that define what makes two rows duplicates — e.g. email, or customer_id + order_date.

Choose the keep strategy

Keep First (oldest row by created_at), Keep Last (most recent), or Keep by ID (highest or lowest primary key).

Choose your output

SELECT to preview duplicates, DELETE to remove them, or CTE to get a deduplicated result set without modifying the table.

Copy the generated SQL

The generator produces a ready-to-run query for your chosen dialect.

Deleting Duplicates

Once you've identified duplicates with ROW_NUMBER(), delete the rows where rn > 1:

-- SQL Server: delete duplicates keeping most recent
WITH ranked AS (
  SELECT id,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
  FROM customers
)
DELETE FROM ranked WHERE rn > 1;

Always test with SELECT first. Run the SELECT version of the deduplication query and verify the rows marked for deletion before running DELETE. Back up your table first if the data is critical.

Dialect Differences

SQL Server allows DELETE directly on a CTE, making deduplication very clean. PostgreSQL and MySQL require a subquery approach because you can't delete directly from a CTE in those databases:

-- PostgreSQL: delete via subquery
DELETE FROM customers
WHERE id IN (
  SELECT id FROM (
    SELECT id,
      ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
    FROM customers
  ) t
  WHERE rn > 1
);

Frequently Asked Questions

What's the safest way to deduplicate a production table?▼

Always: (1) back up the table, (2) run the SELECT version to verify which rows will be deleted, (3) run DELETE inside a transaction so you can rollback if needed, (4) commit only after verifying the row count is as expected.

How do I keep the row with the lowest ID?▼

In the ROW_NUMBER() OVER clause, use ORDER BY id ASC. The first-numbered row (rn = 1) will be the one with the lowest ID value in each duplicate group.

What if I don't have a created_at or ID column?▼

If all columns including any ID are identical, use ROWID (Oracle), CTID (PostgreSQL), or %%physloc%% (SQL Server) as the tiebreaker in the ORDER BY clause to get a stable row selection.

Can ROW_NUMBER handle NULL values in partition columns?▼

Yes — NULL values are treated as equal to each other in PARTITION BY, so rows with NULL in all partition columns are grouped together. Include IS NULL checks in your WHERE clause if you want to handle NULLs separately.

How to Deduplicate Rows in SQL: Complete Guide with Examples

Types of Duplicates

Find Duplicates with ROW_NUMBER()

Find Duplicates with GROUP BY

Using the Generator

Deleting Duplicates

Dialect Differences

Frequently Asked Questions

Related Tools

Related Articles