Back to blog
ComparisonsOctober 12, 2025

5 Ways to Remove Duplicate Lines from a List or File

Browser tool, command line, Excel, Python, VS Code pick the method that fits where your data already lives.

duplicate linestext toolsdata cleaningCSV

For related fixes and guides, see our troubleshooting hub.

Duplicate lines are everywhere: exported CSVs with repeated rows, copy-pasted lists with accidental duplicates, log files with repeated entries, dependency lists with redundant entries. Removing them manually from anything bigger than 20 lines is tedious. Here are five fast methods, ranked by the effort required.

Method 1: Browser Tool (Fastest for Pasted Text)

The remove duplicate lines tool: paste your list, duplicates are removed instantly, copy the result.

Time to complete: 5 seconds.

What it does: Keeps the first occurrence of each line, removes subsequent duplicates. Case-sensitive (so "Alice" and "alice" are kept as different entries unless the same case is repeated).

When to use:

  • Working with lists you can paste (≤ a few thousand lines)
  • You don't have a terminal or code editor open
  • You want the simplest possible workflow

Limitation: Processes the whole block at once. For CSV files where duplicate rows are partial-duplicates (same email, different other fields), you need a more sophisticated tool.

Method 2: Command Line (Fastest for Files)

For text files on Linux/macOS:

# Sort then deduplicate (reorders the output)
sort input.txt | uniq > output.txt

# Preserve original order and deduplicate (GNU awk)
awk '!seen[$0]++' input.txt > output.txt

The sort | uniq pipeline is the classic. It reorders the output not always desirable.

The awk '!seen[$0]++' pattern preserves order: it tracks each line in an associative array and only prints lines it hasn't seen before.

Time to complete: 5 seconds once you know the command.

When to use:

  • Working with files (not pasted text)
  • Files larger than a few thousand lines
  • As part of a shell pipeline or script

On Windows (PowerShell):

Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt

Method 3: VS Code (Built-in, No Extensions Needed)

VS Code has a built-in command for this:

  1. Open the file in VS Code
  2. Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the command palette
  3. Type "sort lines" and select Sort Lines Ascending or Sort Lines Descending
  4. Then run Remove Duplicate Lines (search for it in the command palette)

Wait "Remove Duplicate Lines" isn't actually built in. The sort is. For deduplication, install the Text Power Tools extension (free), which adds a "Remove Duplicate Lines" command to the palette.

Time to complete: 10–15 seconds after the extension is installed.

When to use:

  • Already working in VS Code
  • Want to stay in your editor
  • Text Power Tools is already installed

Method 4: Python (For Scripting or Large Files)

# Preserve order, remove duplicates
seen = set()
unique_lines = []
with open('input.txt') as f:
    for line in f:
        line = line.rstrip('\n')
        if line not in seen:
            seen.add(line)
            unique_lines.append(line)

with open('output.txt', 'w') as f:
    f.write('\n'.join(unique_lines))

Case-insensitive deduplication:

if line.lower() not in seen:
    seen.add(line.lower())
    unique_lines.append(line)

Time to complete: 30 seconds to write + run.

When to use:

  • Files with millions of lines (memory-efficient streaming possible)
  • Need case-insensitive deduplication
  • Part of a larger data pipeline
  • Need to log which lines were removed

Method 5: Excel / Google Sheets

For CSV data already in a spreadsheet:

Excel:

  1. Select the column with duplicates
  2. Data → Remove Duplicates
  3. Select which columns to deduplicate across
  4. Click OK

Google Sheets:

  1. Select the column
  2. Data → Data Cleanup → Remove Duplicates
  3. Select columns and confirm

Time to complete: 15 seconds.

When to use:

  • Data is already in a spreadsheet
  • You need to deduplicate based on multiple columns (e.g. same email AND same name)
  • You want a preview before confirming

Limitation: Works on rows in the spreadsheet context. For simple text line deduplication, it's overkill.

Comparison Table

MethodSpeedFile sizeOrder preservedCase optionsInstall required
Browser tool★★★★★Small-mediumYesYes (fixed)No
Command line★★★★★AnyOptionalYesNo (built-in)
VS Code★★★★☆AnyYesYes (via extension)Extension
Python★★★☆☆AnyYesFully configurablePython
Excel/Sheets★★★★☆MediumNo (by default)Case-insensitiveNo

Choosing the Right Method

Data is in your clipboard → browser tool

Data is in a file and you're in a terminal → command line (awk '!seen[$0]++')

Data is in VS Code → Text Power Tools extension

Data is in a spreadsheet → Excel Remove Duplicates or Google Sheets equivalent

Need case-insensitive or partial-match deduplication → Python

Files over 100 MB → command line or Python (browser tool will slow down)

Edge Cases Worth Knowing

What counts as a "duplicate"?

All five methods above treat lines as duplicates only if they're exactly identical, including whitespace. "Alice " and "Alice" are different lines because of the trailing space. Clean whitespace first with the whitespace remover if trailing spaces are a concern.

Blank lines

The browser tool and awk method keep blank lines that appear once. If you want to remove all blank lines too, use the whitespace remover with "remove blank lines" checked, then deduplicate.

Very large files

The browser tool starts to slow above ~100,000 lines. For anything bigger, use the command line or Python.


For the vast majority of cases pasting a list and wanting duplicates gone the browser tool is the answer. Use the command line or Python when the data is in files or needs scripting.

Related articles