Crossref Similarity Check

Crossref Similarity Check
Name	Crossref Similarity Check
Type	plagiarism detection service
Owner	Crossref
Launched	2012
Based in	United Kingdom

Contents

Overview
History and Development
Functionality and Workflow
Participation and Access
Policies, Licensing, and Privacy
Criticisms and Limitations
Impact on Scholarly Publishing

Crossref Similarity Check

Crossref Similarity Check is a plagiarism detection initiative used by scholarly publishers to compare submitted manuscripts against a large corpus of published literature. It operates within an ecosystem that includes major publishing houses, academic societies, indexing services, and research institutions to assist editors and reviewers in evaluating originality. The service is integrated with editorial workflows and interacts with many prominent journals, publishers, and platforms.

Overview

Crossref Similarity Check provides comparative text-matching reports by scanning submissions against a corpus composed of member content and third-party sources. Major participants include Elsevier, Springer Nature, Wiley-Blackwell, American Chemical Society, and Oxford University Press, while metadata and DOI infrastructure tie into Crossref member records. The initiative complements tools such as Turnitin, iThenticate, and institutional solutions used at places like Harvard University, Stanford University, and University of Oxford. Publishers and editorial offices at organizations like Nature Research, Science (journal), The Lancet, PLOS, and IEEE use it alongside peer review systems from providers such as ScholarOne and Editorial Manager.

History and Development

The service emerged from collaborations among scholarly publishers and the DOI registration agency Crossref in response to concerns raised by editors at outlets including BMJ Group, Wiley, and Elsevier about duplicate publication and plagiarism. Early development drew on existing infrastructures maintained by publishers and indexing services like Scopus and Web of Science. Over time, Crossref Similarity Check expanded its archive through agreements with societies such as the American Mathematical Society, American Physical Society, and Royal Society of Chemistry, and integrated metadata practices influenced by standards from DOAJ and initiatives from COPE and ORCID to improve author identification.

Functionality and Workflow

Editors submit manuscripts to the service via integration points in manuscript handling platforms including ScholarOne, Editorial Manager, and bespoke systems used by publishers like Taylor & Francis and SAGE Publications. The engine compares text to a corpus that includes content from participating publishers, preprint archives like arXiv, and institutional repositories hosted by universities such as Massachusetts Institute of Technology and California Institute of Technology. Results are presented as similarity reports highlighting matched passages and citing source records, often cross-referenced with DOI metadata, publisher information (e.g., Cambridge University Press), and journal titles such as Cell (journal), PNAS, and Journal of the American Medical Association. Editorial decisions then reference policies from organizations like Committee on Publication Ethics.

Participation and Access

Access to the service is restricted to institutions and publishers that are members or licensees; notable participants include Elsevier, Springer Nature, Wiley-Blackwell, American Chemical Society, and societies like IEEE. University presses such as Oxford University Press and Cambridge University Press may participate on behalf of their journals, while research funders and consortia—examples include Wellcome Trust and national consortia in countries such as United Kingdom and Germany—influence participation through funding and policy guidance. Integration partners include editorial management firms and indexing services such as Clarivate, which operates Web of Science, and discovery platforms like Google Scholar.

Policies, Licensing, and Privacy

Use of the service is governed by licensing agreements and content deposition policies negotiated between Crossref and participating publishers, with compliance expectations aligned with guidance from COPE and copyright practices observable at institutions like Harvard University and Yale University. Privacy and data handling are framed by data protection regimes exemplified by laws in the European Union and organizational policies at repositories such as PubMed Central. Publisher agreements set terms for corpus inclusion, retention, and the use of similarity reports in editorial processes at journals such as Nature, Science (journal), and The Lancet.

Criticisms and Limitations

Critics point to limitations similar to those leveled at other text-matching systems used by Turnitin and iThenticate, including false positives from boilerplate text in methods sections common to journals like BMJ, concerns about coverage gaps for humanities content published by presses such as Princeton University Press and Oxford University Press, and the reliance on publisher participation that can exclude regional publishers and open-access platforms such as SciELO and Redalyc. Legal and ethical debates echo controversies involving institutions like Harvard University over student plagiarism detection, and scholarly groups including COPE have discussed proportionality in editorial responses. Technical limits include handling of non-text content in outlets such as IEEE conference proceedings and preprints on bioRxiv.

Impact on Scholarly Publishing

The service has influenced editorial workflows at major journals and publishers including Nature Research, Cell (journal), Science (journal), and PLOS by standardizing an initial check for textual overlap prior to peer review. It also affects author behavior, citation practices in venues like Journal of the American Chemical Society and ACS Publications, and institutional policies at universities such as University of Cambridge and University of Chicago. By connecting DOI metadata managed by Crossref with similarity reporting, it reinforces infrastructure used across the publishing ecosystem, interacting with standards and stakeholders including ORCID, COPE, and major indexing services such as Scopus and Web of Science.

Category: Scholarly publishing