LLMpediaThe first transparent, open encyclopedia generated by LLMs

BagIt

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Invenio Hop 5
Expansion Funnel Raw 56 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted56
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
BagIt
NameBagIt
DeveloperLibrary of Congress; California Digital Library; United States Library of Congress
Released2008
Programming languagePlatform-agnostic
GenreDigital preservation, archival packaging
LicenseOpen standard

BagIt.

BagIt is a hierarchical file packaging format and accompanying set of conventions for transmitting and preserving digital content. Designed to enable reliable exchange among archives and cultural heritage institutions, it specifies a simple directory structure and manifest-based checksums to support integrity verification and long-term stewardship. The format has been adopted by libraries, museums, academic repositories, and government memory organizations for ingest, transfer, and storage workflows.

Overview

BagIt defines a payload directory containing content files and metadata files that describe and protect the payload. The design supports interoperability among institutions such as the Library of Congress, National Archives and Records Administration, California Digital Library, British Library, and Smithsonian Institution. Its manifest-driven integrity checking complements tools and standards like PREMIS, METS, Dublin Core, ISO 16363, and OAIS to form end-to-end preservation workflows. The format is intentionally simple to facilitate implementation by projects ranging from university libraries like Harvard University and Yale University to national initiatives in Canada, Australia, and the European Union.

Specification

The BagIt specification prescribes a directory layout including a bag-info file, one or more manifest files (with cryptographic hashes), and an optional tagmanifest for metadata checksums. The specification references checksum algorithms commonly used by institutions such as NIST and implementations that rely on SHA-256, SHA-1, and MD5 standards. The specification interoperates with packaging and transfer protocols like BagIt Profile and profiling efforts driven by consortia including the Digital Preservation Coalition and DROID-using registries. Profiles allow repositories such as Digital Public Library of America and national libraries to require particular metadata fields and algorithms consistent with mandates from bodies like IETF and preservation policies influenced by UNESCO guidelines.

Implementations and Tools

Multiple open-source and commercial tools implement BagIt processing, providing creation, validation, repair, and manipulation functions. Notable implementations include Java-based libraries used by projects at Harvard Library and Stanford University, Python packages utilized by Internet Archive workflows, and command-line utilities integrated into automation systems at British Library and National Library of New Zealand. Software ecosystems around BagIt interoperate with repository platforms such as DSpace, Fedora Commons, Archivematica, and Islandora, and with checksum and packaging utilities from projects like OpenSSL and GNU coreutils. Commercial digital preservation vendors and consortia such as OCLC and Jisc have also integrated BagIt support into their products and services.

Use Cases and Adoption

BagIt is used for ingest pipelines, cross-institutional transfers, dark archive deposits, and dataset distribution by organizations including NOAA, NASA, European Space Agency, and academic data centers at University of California campuses. It supports research data sharing for funders and programs such as the National Science Foundation and the Medical Research Council; data publishers and consortia like DataCite and Figshare employ BagIt for packaging large datasets. Museums and audiovisual archives at institutions like British Film Institute and Library of Congress use BagIt for audiovisual preservation and accessioning. International standards bodies and national libraries reference BagIt in workflows for digital legal deposit and heritage preservation alongside frameworks from ISO and policy guidance from Council on Library and Information Resources.

Security and Integrity

BagIt’s integrity model centers on manifest files containing cryptographic checksums; verification detects accidental corruption during transfer or storage. The approach aligns with integrity assurance practices from NIST publications and complements transport-layer protections like TLS when used with networked transfer tools. While checksums detect modification, authenticated integrity and non-repudiation require cryptographic signing layers or packaging formats that integrate public-key infrastructure from projects such as OpenPGP or XML Signature. Institutions concerned with chain-of-custody and evidentiary requirements — for example, national archives or legal deposit offices — commonly combine BagIt manifests with audit logs, fixity services, and timestamping provided by trusted timestamp authorities and services referenced by National Institute of Standards and Technology guidance.

History and Development

BagIt originated from collaborative work between the Library of Congress and the California Digital Library in the late 2000s to meet practical needs for exchanging digital content between repositories and preservation services. Subsequent community development involved institutions and projects such as Internet Archive, Harvard Library, Stanford University Libraries, and the National Library of New Zealand, with stewardship moving through open discussions, working groups, and profile proposals hosted by preservation organizations including the Digital Preservation Coalition and the Society of American Archivists. Over the years, BagIt has evolved through versioned specification updates, community-contributed implementations, and profiles that reflect the requirements of diverse institutions such as national libraries, space agencies, and research funders.

Category:Digital preservation