Skip to content
← Blog
🔒 All in your browser 🚫 No uploads stored
Guide

How to Deduplicate Rows in SQL: Complete Guide with Examples

Bill Crawford — Guide — February 2026 — 10 min read  ·  Last updated October 05, 2025
Contents
  1. Types of duplicates
  2. Find duplicates with ROW_NUMBER
  3. Find duplicates with GROUP BY
  4. Using the generator
  5. Deleting duplicates
  6. Dialect differences

Generate a deduplication query for your table.

Connect on LinkedIn →

Open Deduplicate Query Builder ↗

Types of Duplicates

Before writing a deduplication query, you need to decide what counts as a duplicate for your use case:

Find Duplicates with ROW_NUMBER()

The most powerful pattern uses ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) to number each row within each group of duplicates. Row number 1 is the one to keep; rows 2+ are duplicates:

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY email          -- columns that define a duplicate
      ORDER BY created_at DESC    -- keep the most recent
    ) AS rn
  FROM customers
)
SELECT * FROM ranked WHERE rn = 1;  -- keep only first

Find Duplicates with GROUP BY

A simpler approach for just identifying which values have duplicates, without needing to pick a winner:

-- Find emails that appear more than once
SELECT email, COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

Using the Generator

1
Enter your table name

The table you want to deduplicate.

2
Enter the partition columns

The columns that define what makes two rows duplicates — e.g. email, or customer_id + order_date.

3
Choose the keep strategy

Keep First (oldest row by created_at), Keep Last (most recent), or Keep by ID (highest or lowest primary key).

4
Choose your output

SELECT to preview duplicates, DELETE to remove them, or CTE to get a deduplicated result set without modifying the table.

5
Copy the generated SQL

The generator produces a ready-to-run query for your chosen dialect.

Deleting Duplicates

Once you've identified duplicates with ROW_NUMBER(), delete the rows where rn > 1:

-- SQL Server: delete duplicates keeping most recent
WITH ranked AS (
  SELECT id,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
  FROM customers
)
DELETE FROM ranked WHERE rn > 1;

Always test with SELECT first. Run the SELECT version of the deduplication query and verify the rows marked for deletion before running DELETE. Back up your table first if the data is critical.

Dialect Differences

SQL Server allows DELETE directly on a CTE, making deduplication very clean. PostgreSQL and MySQL require a subquery approach because you can't delete directly from a CTE in those databases:

-- PostgreSQL: delete via subquery
DELETE FROM customers
WHERE id IN (
  SELECT id FROM (
    SELECT id,
      ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
    FROM customers
  ) t
  WHERE rn > 1
);

Frequently Asked Questions

What's the safest way to deduplicate a production table?
Always: (1) back up the table, (2) run the SELECT version to verify which rows will be deleted, (3) run DELETE inside a transaction so you can rollback if needed, (4) commit only after verifying the row count is as expected.
How do I keep the row with the lowest ID?
In the ROW_NUMBER() OVER clause, use ORDER BY id ASC. The first-numbered row (rn = 1) will be the one with the lowest ID value in each duplicate group.
What if I don't have a created_at or ID column?
If all columns including any ID are identical, use ROWID (Oracle), CTID (PostgreSQL), or %%physloc%% (SQL Server) as the tiebreaker in the ORDER BY clause to get a stable row selection.
Can ROW_NUMBER handle NULL values in partition columns?
Yes — NULL values are treated as equal to each other in PARTITION BY, so rows with NULL in all partition columns are grouped together. Include IS NULL checks in your WHERE clause if you want to handle NULLs separately.

Related Tools

Further reading: Microsoft — T-SQL Reference

BC
Bill Crawford
Founder, Data Conversion Center

Bill Crawford is a data systems developer and technical founder with over 30 years of professional experience in accounting, finance, and business operations.

He holds a Bachelor's degree in Accounting and has spent more than three decades working within financial and operational environments. Over the past 10 years, he has been heavily involved in the development, implementation, and refinement of financial and enterprise data systems for both Fortune 500 companies and smaller organizations.

His work bridges finance and technology — combining deep domain knowledge in structured reporting and accounting workflows with hands-on SQL development and database architecture experience.

Bill founded DataConversionCenter.com to build practical, browser-based tools that simplify complex data challenges, including:

Rather than focusing on theoretical examples, his tools and articles are informed by real-world challenges encountered in enterprise reporting systems, financial databases, and operational data environments.

Professional Background

Bill's mission is to reduce friction in data workflows — particularly for professionals working with structured financial, operational, and reporting data.