5 Ways to Remove Duplicate Lines from a List or File

For related fixes and guides, see our troubleshooting hub.

Duplicate lines are everywhere: exported CSVs with repeated rows, copy-pasted lists with accidental duplicates, log files with repeated entries, dependency lists with redundant entries. Removing them manually from anything bigger than 20 lines is tedious. Here are five fast methods, ranked by the effort required.

Method 1: Browser Tool (Fastest for Pasted Text)

The remove duplicate lines tool: paste your list, duplicates are removed instantly, copy the result.

Time to complete: 5 seconds.

What it does: Keeps the first occurrence of each line, removes subsequent duplicates. Case-sensitive (so "Alice" and "alice" are kept as different entries unless the same case is repeated).

When to use:

Working with lists you can paste (≤ a few thousand lines)
You don't have a terminal or code editor open
You want the simplest possible workflow

Limitation: Processes the whole block at once. For CSV files where duplicate rows are partial-duplicates (same email, different other fields), you need a more sophisticated tool.

Method 2: Command Line (Fastest for Files)

For text files on Linux/macOS:

# Sort then deduplicate (reorders the output)
sort input.txt | uniq > output.txt

# Preserve original order and deduplicate (GNU awk)
awk '!seen[$0]++' input.txt > output.txt

The sort | uniq pipeline is the classic. It reorders the output not always desirable.

The awk '!seen[$0]++' pattern preserves order: it tracks each line in an associative array and only prints lines it hasn't seen before.

Time to complete: 5 seconds once you know the command.

When to use:

Working with files (not pasted text)
Files larger than a few thousand lines
As part of a shell pipeline or script

On Windows (PowerShell):

Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt

Method 3: VS Code (Built-in, No Extensions Needed)

VS Code has a built-in command for this:

Open the file in VS Code
Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the command palette
Type "sort lines" and select Sort Lines Ascending or Sort Lines Descending
Then run Remove Duplicate Lines (search for it in the command palette)

Wait "Remove Duplicate Lines" isn't actually built in. The sort is. For deduplication, install the Text Power Tools extension (free), which adds a "Remove Duplicate Lines" command to the palette.

Time to complete: 10–15 seconds after the extension is installed.

When to use:

Already working in VS Code
Want to stay in your editor
Text Power Tools is already installed

Method 4: Python (For Scripting or Large Files)

# Preserve order, remove duplicates
seen = set()
unique_lines = []
with open('input.txt') as f:
    for line in f:
        line = line.rstrip('\n')
        if line not in seen:
            seen.add(line)
            unique_lines.append(line)

with open('output.txt', 'w') as f:
    f.write('\n'.join(unique_lines))

Case-insensitive deduplication:

if line.lower() not in seen:
    seen.add(line.lower())
    unique_lines.append(line)

Time to complete: 30 seconds to write + run.

When to use:

Files with millions of lines (memory-efficient streaming possible)
Need case-insensitive deduplication
Part of a larger data pipeline
Need to log which lines were removed

Method 5: Excel / Google Sheets

For CSV data already in a spreadsheet:

Excel:

Select the column with duplicates
Data → Remove Duplicates
Select which columns to deduplicate across
Click OK

Google Sheets:

Select the column
Data → Data Cleanup → Remove Duplicates
Select columns and confirm

Time to complete: 15 seconds.

When to use:

Data is already in a spreadsheet
You need to deduplicate based on multiple columns (e.g. same email AND same name)
You want a preview before confirming

Limitation: Works on rows in the spreadsheet context. For simple text line deduplication, it's overkill.

Comparison Table

Method	Speed	File size	Order preserved	Case options	Install required
Browser tool	★★★★★	Small-medium	Yes	Yes (fixed)	No
Command line	★★★★★	Any	Optional	Yes	No (built-in)
VS Code	★★★★☆	Any	Yes	Yes (via extension)	Extension
Python	★★★☆☆	Any	Yes	Fully configurable	Python
Excel/Sheets	★★★★☆	Medium	No (by default)	Case-insensitive	No

Choosing the Right Method

Data is in your clipboard → browser tool

Data is in a file and you're in a terminal → command line (awk '!seen[$0]++')

Data is in VS Code → Text Power Tools extension

Data is in a spreadsheet → Excel Remove Duplicates or Google Sheets equivalent

Need case-insensitive or partial-match deduplication → Python

Files over 100 MB → command line or Python (browser tool will slow down)

Edge Cases Worth Knowing

What counts as a "duplicate"?

All five methods above treat lines as duplicates only if they're exactly identical, including whitespace. "Alice " and "Alice" are different lines because of the trailing space. Clean whitespace first with the whitespace remover if trailing spaces are a concern.

Blank lines

The browser tool and awk method keep blank lines that appear once. If you want to remove all blank lines too, use the whitespace remover with "remove blank lines" checked, then deduplicate.

Very large files

The browser tool starts to slow above ~100,000 lines. For anything bigger, use the command line or Python.

For the vast majority of cases pasting a list and wanting duplicates gone the browser tool is the answer. Use the command line or Python when the data is in files or needs scripting.

Method 1: Browser Tool (Fastest for Pasted Text)

Method 2: Command Line (Fastest for Files)

Method 3: VS Code (Built-in, No Extensions Needed)

Method 4: Python (For Scripting or Large Files)

Method 5: Excel / Google Sheets

Comparison Table

Choosing the Right Method

Edge Cases Worth Knowing

What counts as a "duplicate"?

Blank lines

Very large files

Related articles