Archive-It — LLMpedia

Archive-It
Name	Archive-It
Owner	Internet Archive
Launch date	2005
Current status	Active

Contents

Introduction
History
Features_and_Functionality
Partners_and_Collaborations
Technical_Infrastructure
Usage_and_Applications

Archive-It is a web archiving service provided by the Internet Archive, a non-profit digital library founded by Brewster Kahle and Bruce Gilliat. The service allows institutions to harvest, preserve, and make accessible World Wide Web content, including websites, social media, and other online resources, as seen in the Library of Congress's Web Archiving program. This is particularly important for preserving the online presence of organizations such as UNESCO, European Union, and National Archives and Records Administration. By partnering with institutions like Stanford University, Harvard University, and University of California, Berkeley, Archive-It helps to ensure the long-term accessibility of online content.

Introduction

Archive-It is used by a wide range of institutions, including libraries, museums, and archives, such as the British Library, National Library of Australia, and National Archives of Canada. These institutions use Archive-It to preserve their online presence, as well as to collect and preserve online content related to their areas of interest, such as art, history, and science. For example, the Smithsonian Institution uses Archive-It to preserve its online collections, including the National Museum of Natural History and the National Air and Space Museum. Similarly, the New York Public Library uses Archive-It to preserve its online resources, including the New York Public Library Digital Collections.

History

The Archive-It service was launched in 2005 by the Internet Archive, with the goal of providing a platform for institutions to preserve and make accessible web content. Since its launch, Archive-It has been used by over 500 institutions, including Yale University, University of Oxford, and Australian National University. The service has also been used to preserve online content related to significant events, such as the September 11 attacks and the 2011 Egyptian revolution, as documented by Al Jazeera and BBC News. Additionally, Archive-It has partnered with organizations like Wikimedia Foundation and Creative Commons to promote the preservation and accessibility of online content.

Features_and_Functionality

Archive-It provides a range of features and functionality to support the preservation and accessibility of web content, including data mining and text analysis tools. Institutions can use Archive-It to harvest web content, including websites, blogs, and social media platforms, such as Twitter and Facebook. The service also provides tools for metadata creation and management, as well as search and discovery functionality, similar to those used by Google and Microsoft. For example, the National Library of Medicine uses Archive-It to preserve its online collections, including the PubMed database and the GenBank database. Similarly, the Library of Congress uses Archive-It to preserve its online resources, including the Chronicling America database and the American Memory collection.

Partners_and_Collaborations

Archive-It has partnered with a range of institutions and organizations to promote the preservation and accessibility of web content, including International Internet Preservation Consortium and Digital Public Library of America. The service has also collaborated with organizations like World Wide Web Consortium and Internet Engineering Task Force to develop standards and best practices for web archiving. Additionally, Archive-It has worked with institutions like University of California, Los Angeles and University of Michigan to develop tools and services for web archiving, such as the Archive-It API and the Web Archiving Toolkit. For example, the National Science Foundation has funded research projects that use Archive-It to preserve and analyze online content related to climate change and public health.

Technical_Infrastructure

The Archive-It service is built on a range of technical infrastructure, including Heritrix and Wayback Machine. The service uses cloud computing and distributed storage to support the preservation and accessibility of large amounts of web content, similar to the infrastructure used by Amazon Web Services and Microsoft Azure. Archive-It also provides tools for quality assurance and quality control, to ensure the accuracy and completeness of preserved web content, as required by institutions like National Institute of Standards and Technology and European Organization for Nuclear Research. For example, the Stanford University Libraries use Archive-It to preserve its online collections, including the Stanford Digital Repository and the Stanford University Archives.

Usage_and_Applications

Archive-It has a range of uses and applications, including research, education, and preservation. Institutions can use Archive-It to preserve and make accessible online content related to their areas of interest, such as history, art, and science. The service is also used by journalists and researchers to access and analyze preserved web content, as seen in the work of The New York Times and The Guardian. For example, the University of Cambridge uses Archive-It to preserve its online collections, including the Cambridge University Library and the Fitzwilliam Museum. Similarly, the National Gallery of Art uses Archive-It to preserve its online resources, including the NGA Online Collection and the NGA Library. Additionally, Archive-It has been used to preserve online content related to significant events, such as the 2010 Haiti earthquake and the 2011 Japanese tsunami, as documented by CNN and BBC News.

Category:Web archiving