RDM — LLMpedia

RDM
Name	RDM
Type	Practice
Field	Research

Contents

Definition and Scope
Historical Development
Principles and Best Practices
Tools and Technologies
Implementation in Research Workflows
Ethical, Legal, and Security Considerations
Challenges and Future Directions

RDM is a set of practices, standards, and infrastructures for organizing, storing, preserving, and sharing data produced in research contexts. It supports reproducibility, transparency, and reuse by linking datasets to publications, projects, funders, and institutions. RDM practices intersect with funder mandates, publisher policies, and institutional repositories to ensure data integrity and long-term access.

Definition and Scope

RDM encompasses activities from data creation to long-term preservation, including planning, documentation, storage, sharing, metadata, and curation. It touches on obligations set by National Institutes of Health, European Commission, Wellcome Trust, National Science Foundation, and Horizon Europe while integrating with repositories such as Zenodo, Dryad, Figshare, ICPSR, and Dataverse. Core components include data management plans, metadata schemas like Dublin Core, persistent identifiers such as Digital Object Identifier and ORCID, and policies from organizations like Committee on Publication Ethics and OpenAIRE.

Historical Development

RDM emerged from archival traditions and the growth of digital scholarship in the late 20th and early 21st centuries. Early influences include practices at Library of Congress, data archives like Protein Data Bank, and major projects such as the Human Genome Project which catalyzed sharing norms. Policy milestones include mandates by National Institutes of Health in the 2000s, the Budapest Open Access Initiative, and the FAIR Principles formulation. Infrastructure advances from CERN, European Organization for Nuclear Research, and national e-infrastructure programs shaped modern RDM workflows, paralleling developments in arXiv and PubMed Central for literature.

Principles and Best Practices

Best practices emphasize planning, metadata, versioning, documentation, and stewardship. The FAIR Principles (Findable, Accessible, Interoperable, Reusable) integrate with practices promoted by Research Data Alliance and GO FAIR. Effective RDM uses standardized metadata vocabularies like Data Documentation Initiative and ISO 19115 for geospatial content, and preservation frameworks from OAIS and ISO 16363. Curation methods draw on approaches used by National Archives, while citation practices follow formats endorsed by International DOI Foundation and publishers such as Nature Publishing Group and Elsevier.

Tools and Technologies

RDM relies on a stack of software, infrastructure, and standards. Repositories and platforms include Zenodo, Dryad, Figshare, Dataverse, GitHub, and GitLab for version control. Metadata and interoperability use schemas like Dublin Core, DataCite Schema, and protocols such as OAI-PMH and RESTful API implementations by major archives. Storage and preservation use systems like LOCKSS, CERN EOS, cloud services from Amazon Web Services, Google Cloud Platform, and Microsoft Azure, and containerization with Docker and Kubernetes for reproducible environments. Authentication and identity leverage ORCID and federated access via eduGAIN and Shibboleth. Workflow automation integrates tools such as Jupyter Notebook, RStudio, Nextflow, and Snakemake.

Implementation in Research Workflows

Practical implementation starts with a data management plan often required by funders like National Science Foundation, Biotechnology and Biological Sciences Research Council, and Health Research Council of New Zealand. Acquisition, documentation, and storage are integrated with laboratory information management systems used in institutions like Harvard University and Stanford University and with field programs such as those run by United States Geological Survey and NASA. Publication workflows link datasets to journals like PLOS ONE, Science, and The Lancet and to indexing services such as Crossref and PubMed. Collaborative projects leverage infrastructures from European Open Science Cloud and consortia including ELIXIR and HPC Europa.

Ethical, Legal, and Security Considerations

RDM must address privacy, consent, intellectual property, and national regulation. Sensitive data handling follows standards and laws such as General Data Protection Regulation, Health Insurance Portability and Accountability Act, and guidance from ethics committees at institutions like Johns Hopkins University. Licensing choices use frameworks from Creative Commons and Open Data Commons and interface with contractual terms of funders like Bill & Melinda Gates Foundation. Security practices adopt standards such as NIST Cybersecurity Framework and guidance from Cybersecurity and Infrastructure Security Agency for risk management and incident response.

Challenges and Future Directions

Challenges include scaling preservation for large, heterogeneous datasets produced by projects like Large Hadron Collider, ensuring interoperability across standards used by World Health Organization and biodiversity networks like GBIF, and providing sustainable funding models similar to national library and archive models. Emerging directions emphasize machine-actionable metadata, integration with artificial intelligence research platforms at organizations like Google Research and OpenAI, and further alignment with open science movements including Plan S and initiatives from UNESCO. Continued development will require collaboration among funders, publishers, repositories, and research infrastructures including Research Council UK and European Research Council to balance openness, privacy, and sustainability.

Category:Research data management