How to Remove Duplicate Lines from Any Text File
A practical guide to finding and removing repeated lines from lists, CSVs, and log files with and without code.
For related fixes and guides, see our troubleshooting hub.
Duplicate lines are one of the most common data quality problems. They sneak into:
- Keyword lists you export from multiple sources
- CSV files merged from different departments
- Log files with repeated error entries
- Email lists collected over time
- Dependency files touched by multiple developers
Removing them manually is impossible at scale. Here are your options from the quickest browser tool to terminal commands for power users.
The Fastest Method: A Browser-Based Tool
If you have under a few thousand lines, the zero-friction approach is a duplicate line remover. Paste your text, get the deduplicated result, copy it. No terminal, no scripts, no setup.
This handles most real-world cases. A keyword list with 500 entries, a mailing list with some copy-paste repeats, a log snippet you are trying to clean up a browser tool is the right choice.
What to look for in a good tool:
- Preserves the original order (keeps first occurrence)
- Handles blank lines sensibly
- Works client-side (no file upload)
- Handles large text without freezing
Using Sort and Uniq on macOS / Linux
For larger files or automation, the terminal is your friend:
sort -u input.txt > output.txt
sort -u sorts and removes duplicates in one pass. The downside: it changes the order of your lines.
To preserve order while deduplicating:
awk '!seen[$0]++' input.txt > output.txt
awk '!seen[$0]++' keeps the first occurrence of each unique line and discards the rest in original order. This is the most useful version for most situations.
PowerShell on Windows
Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt
Like sort -u, this changes the order. To preserve order:
$seen = @{}
Get-Content input.txt | Where-Object { $seen[$_]++ -eq 0 }
Python One-Liner
lines = open("input.txt").readlines()
unique = list(dict.fromkeys(lines))
open("output.txt", "w").writelines(unique)
dict.fromkeys preserves insertion order and automatically deduplicates. Clean and Pythonic.
Case Sensitivity
Most tools treat lines as case-sensitive by default Hello and hello are different lines. If you want case-insensitive deduplication:
awk (case-insensitive):
awk '!seen[tolower($0)]++' input.txt
Python (case-insensitive):
seen = set()
unique = []
for line in open("input.txt"):
key = line.strip().lower()
if key not in seen:
seen.add(key)
unique.append(line)
open("output.txt", "w").writelines(unique)
Common Pitfalls
Trailing whitespace: "Apple " and "Apple" look the same but are not. Trim lines before comparing if whitespace is not meaningful.
Line endings: Windows files use \r\n, Unix files use \n. A mismatch can cause tools to treat identical-looking lines as different. Most good tools normalise this automatically.
Empty lines: Decide whether you want to keep or remove empty lines. They are technically "duplicate" if there are multiple of them.
For quick browser-based deduplication, the ToolKits Remove Duplicate Lines tool handles all of this with no setup. For ongoing automation, the awk one-liner is hard to beat.