Data Curation Network

Data Curation Network
Name	Data Curation Network
Formation	2016
Headquarters	Chicago, Illinois
Region served	United States
Type	Consortium

Contents

Overview
History
Organization and Governance
Services and Activities
Membership and Partnerships
Impact and Case Studies
Challenges and Future Directions

Data Curation Network is a collaborative consortium of institutional repositories, libraries, and research data professionals that provides peer-reviewed data curation and stewardship services. The project connects curators across institutions to enhance long-term access to research datasets, leveraging shared workflows, training, and preservation standards. It operates through distributed partnerships to support repositories, archives, and digital libraries associated with universities, federal agencies, and scholarly publishers.

Overview

The network coordinates expertise from institutions such as University of Illinois at Urbana–Champaign, University of Michigan, University of Minnesota, Cornell University, and Yale University to deliver centralized curation for datasets deposited in institutional repositories like Merritt Repository, DSpace, and Figshare. It emphasizes standards and practices drawn from organizations including International Council on Archives, Digital Preservation Coalition, Research Data Alliance, Data Documentation Initiative, and OpenAIRE. Its model addresses data lifecycle stages recognized by agencies such as the National Institutes of Health, National Science Foundation, and European Commission while aligning with metadata schemas like Dublin Core, PREMIS, and Schema.org.

History

The initiative began amid broader movements such as the rise of FAIR principles advocacy, mandates from funders like the National Science Foundation and Wellcome Trust, and the growth of institutional repositories at universities including Harvard University and Stanford University. Early convenings involved stakeholders from libraries at University of California, Berkeley, Cornell University, and University of Illinois at Urbana–Champaign, alongside data stewardship efforts from Oak Ridge National Laboratory and Los Alamos National Laboratory. Pilot projects drew on practices from DataONE, ICPSR, and Dryad before formalizing into a cooperative model that expanded to include partners such as University of Arizona and University of Minnesota.

Organization and Governance

Governance combines advisory input from academic libraries like Columbia University and University of Washington with operational staffing at member institutions including University of Michigan and Yale University. Steering committees have involved representatives affiliated with professional bodies such as Association of Research Libraries and Society of American Archivists. Funding and oversight have intersected with agencies and programs like the Institute of Museum and Library Services, National Endowment for the Humanities, and grant-making entities connected to Alfred P. Sloan Foundation and Andrew W. Mellon Foundation. Decision-making integrates repository managers from systems such as Fedora (software), DSpace, and cloud platforms maintained by Amazon Web Services and Google Cloud Platform.

Services and Activities

The consortium provides peer-reviewed curation services, metadata enhancement, format validation, and sensitive-data review for datasets generated by researchers affiliated with institutions like Princeton University, Massachusetts Institute of Technology, Johns Hopkins University, and University of California, Los Angeles. Training programs reference curricula from Coursera and professional development from Society of American Archivists and International Federation of Library Associations and Institutions. Technical activities include checksum validation standards promoted by National Institute of Standards and Technology, format registries akin to PRONOM, and persistent identifier practices using systems like Digital Object Identifier and ORCID. The network also facilitates policy development for repositories modeled on efforts by Harvard Dataverse and Zenodo.

Membership and Partnerships

Membership spans universities such as Indiana University Bloomington, Ohio State University, University of Wisconsin–Madison, and University of North Carolina at Chapel Hill, along with collaborations with infrastructure providers like Internet Archive and scholarly publishers including Springer Nature and Elsevier. Partnerships involve governmental research agencies such as National Oceanic and Atmospheric Administration and international organizations like European Research Council. The network has engaged with disciplinary archives including ICPSR, GenBank, and PANGAEA to coordinate deposition workflows and with standards bodies such as W3C and ISO committees.

Impact and Case Studies

Case studies document curation outcomes for research projects at institutions like University of Minnesota and University of Michigan that improved dataset discoverability in aggregators such as Google Scholar and DataCite. Impacts include increased citation rates observed in analyses similar to studies from Leiden University and enhanced compliance with funder policies exemplified by National Institutes of Health and Wellcome Trust mandates. Collaborations with disciplinary repositories like Dryad and infrastructure projects such as DataONE have yielded reusable workflows and metrics comparable to preservation efforts at LOCKSS and CLOCKSS.

Challenges and Future Directions

Challenges include sustaining funding models amid shifts in grant landscapes overseen by agencies such as National Science Foundation and Institute of Museum and Library Services, integrating with commercial platforms like Elsevier and Figshare, and scaling technical infrastructure used by cloud providers such as Amazon Web Services and Google Cloud Platform. Future directions point toward tighter interoperability with initiatives led by Research Data Alliance, adoption of machine-actionable metadata standards promoted by Data Documentation Initiative and Schema.org, and expanding capacity to support international consortia involving institutions such as University of Toronto and University College London. Ongoing work will likely engage with legal frameworks including General Data Protection Regulation and ethical standards advocated by organizations like Committee on Publication Ethics.

Category:Data management