LLMpediaThe first transparent, open encyclopedia generated by LLMs

EUDAT B2SAFE

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Invenio Hop 5
Expansion Funnel Raw 137 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted137
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
EUDAT B2SAFE
NameB2SAFE
DeveloperEUDAT
Initial release2010s
PlatformDistributed storage, grid, cloud
LicenseOpen source / community

EUDAT B2SAFE EUDAT B2SAFE is a data management service for reliable data preservation and replication across research infrastructures such as CERN, Max Planck Society, European Space Agency, European Commission, and European Organization for Nuclear Research. It provides automated data lifecycle safeguards used by projects including Human Genome Project, European Plate Observing System, PRACE, Copernicus Programme, and Horizon 2020 partners. B2SAFE builds on technologies associated with iRODS, GridFTP, Globus, iRODS consortium, OpenStack and interoperates with repositories like Zenodo, Figshare, Dryad, PANGAEA, and Dataverse.

Overview

B2SAFE offers a managed framework to perform secure replication among geographically distributed storage endpoints such as CERN Data Centre, European Southern Observatory, James Webb Space Telescope archives, National Institutes of Health, and national research networks like SURFnet, GARR, DFN, RESTENA, and RedCLARA. It is designed to serve communities represented by ELIXIR, EPOS, CLARIN, EISCAT, EuroHPC, European XFEL, EMBL-EBI, NASA, and NOAA by ensuring persistent copies using checksum verification, metadata synchronization, and policy-driven workflows similar to practices at Library of Congress, British Library, National Library of France, Deutsche Nationalbibliothek, and BIBSYS.

Architecture and Components

The architecture integrates components such as iRODS rule engines, GridFTP transfer agents, Globus Online connectors, OpenStack Swift or Ceph object stores, and identity layers using eduGAIN, ORCID, LDAP, Shibboleth, and OAuth. Core modules mirror approaches from European Middleware Initiative, EGI, PRACE, EOSC-hub, and SCAPE and interface with metadata systems like Dublin Core, PREMIS, DataCite, EUDAT B2FIND, and catalogues used by World Data System and GBIF. The stack supports deployment on infrastructures managed by CERN IT, INFN, CNRS, CSC — IT Center for Science, SURFsara, and PSNC.

Data Replication and Integrity

B2SAFE implements synchronous and asynchronous replication strategies with provenance tracking that echo methods used by LOCKSS, Portico, CLOCKSS, Dataverse Project, and NARA workflows. Replication relies on checksums such as MD5 and SHA-256 and compares manifests with registries maintained by agencies like European Commission Directorate-General for Research and Innovation, INSPIRE Directive registries, and domain repositories used by Eurostat and Copernicus. Its integrity model references verification practices from ISO 16363, OAIS, ISO 14721, NIST, and preservation strategies adopted by National Archives UK and Library of Congress.

Deployment and Integration

Deployments are performed within data centers operated by CERN, Jülich Research Centre, CNRS-IDRIS, LRZ, BSC, SARA, and CSC. Integration paths include connectors to Globus Transfer, iRODS federation, S3 API endpoints used by Amazon Web Services, Google Cloud Platform, Microsoft Azure, and academic clouds such as OpenStack clouds run by GÉANT. It supports workflow orchestration compatible with Apache Airflow, Nextflow, Galaxy Project, Snakemake, and HPC schedulers like SLURM and PBS Professional used at PRACE centers.

Use Cases and Adoption

Adopters include large-scale science initiatives and national consortia such as ELIXIR, EPOS, CLARIN, CERN Open Data, European Space Agency archives, European Marine Observation and Data Network, ICOS, and ICOS ERIC. Use cases span long-term preservation for projects like Human Cell Atlas, distributed backups for LIGO Scientific Collaboration, shared replicas for SKA, and multi-site processing pipelines for ARISE and Copernicus. Research infrastructures leveraging B2SAFE practices often coordinate with funding agencies such as European Commission, ERC, Horizon 2020, Horizon Europe, and national science foundations including DFG, ANR, and UKRI.

Security and Compliance

Security integrates authentication and authorization through eduGAIN, Shibboleth, ORCID, and X.509 certificates, with audit trails conforming to standards used by ISO 27001, GDPR, NIST Cybersecurity Framework, and compliance regimes observed by European Data Protection Supervisor and national data protection authorities such as CNIL and ICO. The platform supports encryption in transit via TLS and secure storage practices akin to those at European Central Bank IT operations and enterprise archives like UN Archives.

Performance and Scalability

B2SAFE scales by federating storage nodes across centers operated by CERN IT, SURFsara, CSC, PSNC, LRZ, and BSC and by leveraging high-performance transfer tools developed by Globus, GridFTP, and Aspera. Performance tuning borrows techniques from PRACE system administration, Top500 site optimization, HPCx best practices, and parallel I/O strategies used at Oak Ridge National Laboratory, Argonne National Laboratory, Lawrence Berkeley National Laboratory, and NERSC.

Category:Research data management