Skip to Main Content Banner

Systematic Literature Reviews

What is Deduplication?

Deduplication is the process of removing results that appeared more than once during your search process, either because a study was found in multiple databases or because studies you located used the same data set. If undetected, either could create bias in the conclusions of your review. 

Identifying and removing duplicate records is necessary because multiple databases often index overlapping journals. Your method of deduplication may depend on the number of articles included in your review: manual deduplication is more realistic with smaller numbers, whereas larger numbers may require automatic tools. Automatic tools are not perfect, so both methods should be used for accurate deduplication. Whichever process you decide to follow, document it and report it accurately in your review. 

Identifying multiple articles published from the same data set is more complicated. The Cochrane Handbook offers guidance on this:

This part of deduplication requires careful analysis because you don't want to leave out important articles. 

You need to track the number of duplicate articles you remove for either reason for inclusion in your PRISMA diagram. 

Manual Deduplication

Export your references to a CSV or Excel file. In most cases, you will need to first use conditional formatting in Excel to identify duplicates, then do a final scan manually.

Step 1: Using Conditional Formatting to Identify Duplicates

  • Replace punctuation (dashes, periods, question marks, semi-colons, colons) in titles with spaces using the find and replace tool.
  • Sort the column alphabetically. (Start with titles, though you can use this same process for any other columns you choose, such as DOI.)
  • Select conditional formatting from the Home ribbon, go to Highlight Cells Rules, then Duplicate Values.
  •  

Step 2: Manual Scan

  • Sort by title
  • Scan through the list, looking for duplicate titles
  • Check the additional information (author, journal, volume, page number) to make sure it matches before designating a duplicate.

DO NOT delete duplicate records. Instead, move them to a separate sheet to track the number of duplicates you remove from your review.

This process was adapted from: 

Kwon, Y., Lemieux, M., McTavish, J., & Wathen, N. (2015). Identifying and removing duplicate records from systematic review searches. Journal of the Medical Library Association, 103(4), 84-88.  https://doi.org/10.3163/1536-5050.103.4.004

Deduplication with Software

Most bibliographic management software includes a deduplication option. You might consider uploading your references to Zotero, for example, removing the duplicates, and then going through the remainder of your list manually. 

If you are using Zotero as a bibliographic management software, this short tutorial will get you started on the deduplication process: Zotero Deduplication