5 Ways to Remove Duplicate Lines from a List or File
Browser tool, command line, Excel, Python, VS Code pick the method that fits where your data already lives.
For related fixes and guides, see our troubleshooting hub.
Duplicate lines are everywhere: exported CSVs with repeated rows, copy-pasted lists with accidental duplicates, log files with repeated entries, dependency lists with redundant entries. Removing them manually from anything bigger than 20 lines is tedious. Here are five fast methods, ranked by the effort required.
Method 1: Browser Tool (Fastest for Pasted Text)
The remove duplicate lines tool: paste your list, duplicates are removed instantly, copy the result.
Time to complete: 5 seconds.
What it does: Keeps the first occurrence of each line, removes subsequent duplicates. Case-sensitive (so "Alice" and "alice" are kept as different entries unless the same case is repeated).
When to use:
- Working with lists you can paste (≤ a few thousand lines)
- You don't have a terminal or code editor open
- You want the simplest possible workflow
Limitation: Processes the whole block at once. For CSV files where duplicate rows are partial-duplicates (same email, different other fields), you need a more sophisticated tool.
Method 2: Command Line (Fastest for Files)
For text files on Linux/macOS:
# Sort then deduplicate (reorders the output)
sort input.txt | uniq > output.txt
# Preserve original order and deduplicate (GNU awk)
awk '!seen[$0]++' input.txt > output.txt
The sort | uniq pipeline is the classic. It reorders the output not always desirable.
The awk '!seen[$0]++' pattern preserves order: it tracks each line in an associative array and only prints lines it hasn't seen before.
Time to complete: 5 seconds once you know the command.
When to use:
- Working with files (not pasted text)
- Files larger than a few thousand lines
- As part of a shell pipeline or script
On Windows (PowerShell):
Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt
Method 3: VS Code (Built-in, No Extensions Needed)
VS Code has a built-in command for this:
- Open the file in VS Code
- Press
Ctrl+Shift+P(orCmd+Shift+Pon macOS) to open the command palette - Type "sort lines" and select Sort Lines Ascending or Sort Lines Descending
- Then run Remove Duplicate Lines (search for it in the command palette)
Wait "Remove Duplicate Lines" isn't actually built in. The sort is. For deduplication, install the Text Power Tools extension (free), which adds a "Remove Duplicate Lines" command to the palette.
Time to complete: 10–15 seconds after the extension is installed.
When to use:
- Already working in VS Code
- Want to stay in your editor
- Text Power Tools is already installed
Method 4: Python (For Scripting or Large Files)
# Preserve order, remove duplicates
seen = set()
unique_lines = []
with open('input.txt') as f:
for line in f:
line = line.rstrip('\n')
if line not in seen:
seen.add(line)
unique_lines.append(line)
with open('output.txt', 'w') as f:
f.write('\n'.join(unique_lines))
Case-insensitive deduplication:
if line.lower() not in seen:
seen.add(line.lower())
unique_lines.append(line)
Time to complete: 30 seconds to write + run.
When to use:
- Files with millions of lines (memory-efficient streaming possible)
- Need case-insensitive deduplication
- Part of a larger data pipeline
- Need to log which lines were removed
Method 5: Excel / Google Sheets
For CSV data already in a spreadsheet:
Excel:
- Select the column with duplicates
- Data → Remove Duplicates
- Select which columns to deduplicate across
- Click OK
Google Sheets:
- Select the column
- Data → Data Cleanup → Remove Duplicates
- Select columns and confirm
Time to complete: 15 seconds.
When to use:
- Data is already in a spreadsheet
- You need to deduplicate based on multiple columns (e.g. same email AND same name)
- You want a preview before confirming
Limitation: Works on rows in the spreadsheet context. For simple text line deduplication, it's overkill.
Comparison Table
| Method | Speed | File size | Order preserved | Case options | Install required |
|---|---|---|---|---|---|
| Browser tool | ★★★★★ | Small-medium | Yes | Yes (fixed) | No |
| Command line | ★★★★★ | Any | Optional | Yes | No (built-in) |
| VS Code | ★★★★☆ | Any | Yes | Yes (via extension) | Extension |
| Python | ★★★☆☆ | Any | Yes | Fully configurable | Python |
| Excel/Sheets | ★★★★☆ | Medium | No (by default) | Case-insensitive | No |
Choosing the Right Method
Data is in your clipboard → browser tool
Data is in a file and you're in a terminal → command line (awk '!seen[$0]++')
Data is in VS Code → Text Power Tools extension
Data is in a spreadsheet → Excel Remove Duplicates or Google Sheets equivalent
Need case-insensitive or partial-match deduplication → Python
Files over 100 MB → command line or Python (browser tool will slow down)
Edge Cases Worth Knowing
What counts as a "duplicate"?
All five methods above treat lines as duplicates only if they're exactly identical, including whitespace. "Alice " and "Alice" are different lines because of the trailing space. Clean whitespace first with the whitespace remover if trailing spaces are a concern.
Blank lines
The browser tool and awk method keep blank lines that appear once. If you want to remove all blank lines too, use the whitespace remover with "remove blank lines" checked, then deduplicate.
Very large files
The browser tool starts to slow above ~100,000 lines. For anything bigger, use the command line or Python.
For the vast majority of cases pasting a list and wanting duplicates gone the browser tool is the answer. Use the command line or Python when the data is in files or needs scripting.