FAIR — LLMpedia

FAIR
Name	FAIR
Abbreviation	FAIR
Established	2016
Focus	Data stewardship, metadata, reuse
Founders	Philippe Bourne, Mark Wilkinson, Susanna-Assunta Sansone

Contents

Overview
Principles
Implementation and Tools
Adoption and Impact
Challenges and Criticisms
Case Studies and Examples

FAIR

FAIR is an acronym coined to encapsulate a set of guiding principles for making data and digital resources more usable by machines and people. Originally articulated by a consortium of researchers and technologists, FAIR shaped practices at institutions, repositories, publishers, and funders seeking to improve discovery, interoperability, and reuse of datasets, software, and workflows. The initiative influenced policies across research infrastructures and intersects with standards bodies, scholarly publishers, and international projects.

Overview

The FAIR initiative emerged from discussions among scholars associated with organizations such as the European Open Science Cloud, the Research Data Alliance, the Wellcome Trust, and the National Institutes of Health. It set out to address problems encountered by projects funded by agencies like the European Commission and the United States National Science Foundation where datasets generated by consortia including the Human Genome Project, the Large Hadron Collider collaborations, and the Human Cell Atlas were difficult to find or integrate. FAIR emphasized machine-actionability as a way to scale discovery across systems such as the DataCite metadata index, the Global Alliance for Genomics and Health, and the OpenAIRE infrastructure. The principles gained traction among publishers such as Elsevier and Springer Nature and repositories like Zenodo and Figshare.

Principles

FAIR is expressed through a concise set of recommendations addressing identifiers, metadata, and access patterns that align with practices from bodies like the World Wide Web Consortium, the International Organization for Standardization, and the Digital Curation Centre. Core dimensions of the principles include persistent identifiers adopted from systems such as Digital Object Identifier and Handle System, rich metadata schemas informed by standards like Dublin Core and Schema.org, and interoperability grounded in ontologies such as Gene Ontology, SNOMED CT, and SKOS. The principles recommend accessibility via protocols used by services like HTTP, authentication frameworks exemplified by OAuth, and metadata registries comparable to BioSchemas. Reusability spans licensing models promoted by groups like the Creative Commons and provenance standards exemplified by W3C PROV and casework from initiatives like ELIXIR and CERN.

Implementation and Tools

Implementations draw on technologies and platforms from diverse projects: repository software such as Invenio and DSpace, metadata managers like Metadatapool and CKAN, and registries such as ORCID for person identifiers and ROR for organizational identifiers. Tools for validation include validators developed in the context of BioSchemas and the RDA FAIR Data Maturity Model, while indexing services such as Google Dataset Search and Microsoft Academic surface FAIR-aligned resources. Workflow engines such as Galaxy and Nextflow enable reproducible pipelines integrating FAIR outputs, and container technologies like Docker and Singularity package software with metadata. Funders and publishers integrate tools from Figshare, Dryad, and institutional infrastructures built on EPrints and HAL to operationalize compliance.

Adoption and Impact

Adoption spans national initiatives like France's National Research Agency policies, mandates from the European Research Council, and repositories operated by institutions such as the Max Planck Society and the University of Oxford. The push for FAIR influenced initiatives including the FAIRsharing registry and the GO FAIR movement, impacted standards uptake at ISO committees, and shaped data management plan templates used by Horizon 2020 and Horizon Europe. In science domains, FAIR practices improved integration in projects such as the International Cancer Genome Consortium, Earth System Grid Federation, and the International Neuroinformatics Coordinating Facility. Metrics and indicators inspired by groups like the Global Research Council and the Wellcome Trust evaluate FAIRness in grant reporting and repository assessment.

Challenges and Criticisms

Critiques emerged from stakeholders in academia, industry, and libraries, including concerns raised in forums hosted by the Open Knowledge Foundation and debates at conferences such as SciDataCon. Key challenges include ambiguities in interpretation across domains exemplified by discussions in bioinformatics consortia and the need for resources akin to those allocated by agencies like the G7 for capacity building. Critics point to the imbalance between tooling provided by commercial entities such as Clarivate or Elsevier and community-run infrastructures, the risk of privileging well-resourced projects like CERN or EMBL over smaller labs, and the potential for FAIR to be treated as a compliance checkbox by funders like the National Institutes of Health and the Wellcome Trust instead of a cultural change. Interoperability barriers persist where ontologies from NCBI and UniProt do not align, and where privacy regulations such as the General Data Protection Regulation constrain open reuse.

Case Studies and Examples

Practical applications include the integration of FAIR practices in the European Genome-phenome Archive to enhance discoverability of genomic datasets, the adoption of persistent identifiers and metadata standards in the Human Cell Atlas enabling cross-project queries, and the deployment of FAIR-aligned repositories such as Zenodo for software linked to publications in PLOS journals. Other examples feature domain-specific registries like Metabolights and the ProteomeXchange consortium improving reuse in metabolomics and proteomics, and infrastructure projects such as ELIXIR and the National Center for Biotechnology Information incorporating persistent identifiers and provenance metadata. Cross-disciplinary efforts include the use of FAIR principles in the COPERNICUS program for earth observation and in clinical data harmonization initiatives coordinated by the Global Alliance for Genomics and Health.

Category:Data management