Back to blog
GuidesJune 5, 2024

How to Remove Duplicate Lines from Any Text File

A practical guide to finding and removing repeated lines from lists, CSVs, and log files with and without code.

text toolsproductivitydata cleaningduplicates

For related fixes and guides, see our troubleshooting hub.

Duplicate lines are one of the most common data quality problems. They sneak into:

  • Keyword lists you export from multiple sources
  • CSV files merged from different departments
  • Log files with repeated error entries
  • Email lists collected over time
  • Dependency files touched by multiple developers

Removing them manually is impossible at scale. Here are your options from the quickest browser tool to terminal commands for power users.

The Fastest Method: A Browser-Based Tool

If you have under a few thousand lines, the zero-friction approach is a duplicate line remover. Paste your text, get the deduplicated result, copy it. No terminal, no scripts, no setup.

This handles most real-world cases. A keyword list with 500 entries, a mailing list with some copy-paste repeats, a log snippet you are trying to clean up a browser tool is the right choice.

What to look for in a good tool:

  • Preserves the original order (keeps first occurrence)
  • Handles blank lines sensibly
  • Works client-side (no file upload)
  • Handles large text without freezing

Using Sort and Uniq on macOS / Linux

For larger files or automation, the terminal is your friend:

sort -u input.txt > output.txt

sort -u sorts and removes duplicates in one pass. The downside: it changes the order of your lines.

To preserve order while deduplicating:

awk '!seen[$0]++' input.txt > output.txt

awk '!seen[$0]++' keeps the first occurrence of each unique line and discards the rest in original order. This is the most useful version for most situations.

PowerShell on Windows

Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt

Like sort -u, this changes the order. To preserve order:

$seen = @{}
Get-Content input.txt | Where-Object { $seen[$_]++ -eq 0 }

Python One-Liner

lines = open("input.txt").readlines()
unique = list(dict.fromkeys(lines))
open("output.txt", "w").writelines(unique)

dict.fromkeys preserves insertion order and automatically deduplicates. Clean and Pythonic.

Case Sensitivity

Most tools treat lines as case-sensitive by default Hello and hello are different lines. If you want case-insensitive deduplication:

awk (case-insensitive):

awk '!seen[tolower($0)]++' input.txt

Python (case-insensitive):

seen = set()
unique = []
for line in open("input.txt"):
    key = line.strip().lower()
    if key not in seen:
        seen.add(key)
        unique.append(line)
open("output.txt", "w").writelines(unique)

Common Pitfalls

Trailing whitespace: "Apple " and "Apple" look the same but are not. Trim lines before comparing if whitespace is not meaningful.

Line endings: Windows files use \r\n, Unix files use \n. A mismatch can cause tools to treat identical-looking lines as different. Most good tools normalise this automatically.

Empty lines: Decide whether you want to keep or remove empty lines. They are technically "duplicate" if there are multiple of them.


For quick browser-based deduplication, the ToolKits Remove Duplicate Lines tool handles all of this with no setup. For ongoing automation, the awk one-liner is hard to beat.

Related articles