Distributed Proofreaders

Distributed Proofreaders
Name	Distributed Proofreaders
Founded	2000
Founders	Charles Franks
Location	United States
Focus	Text digitization

Contents

History
Organization and Operation
Workflow and Technology
Projects and Contributions
Copyright and Licensing
Reception and Impact

Distributed Proofreaders is a volunteer-driven online community that converts public-domain books into digital texts for use by projects such as Project Gutenberg, Internet Archive, Google Books, HathiTrust, and other repositories. The organization collaborates with volunteers from platforms and institutions including Wikimedia Foundation, Library of Congress, British Library, New York Public Library, and a wide network of independent contributors associated with initiatives like Project Runeberg, Open Library, and Europeana. Members include amateur editors, academics linked to universities such as Harvard University, University of Oxford, Stanford University, and professionals from companies like Amazon (company), Microsoft, and Yahoo!.

History

The project emerged in the context of early optical character recognition work from efforts by groups influenced by pioneers like Project Gutenberg founder Michael S. Hart, and contemporaneous digitization endeavors at Digital Library of India, Google Books, and Microsoft Research. Early milestones intersect with major archival and legal events such as the United States v. Microsoft Corp. litigation era, the expansion of the Internet Archive collections, and partnerships following digitization summits attended by representatives of Library of Congress, British Library, and cultural organizations like Smithsonian Institution. Over time the initiative adapted to shifts in scanning technology from vendors such as Canon Inc. and Fujitsu and responded to legal frameworks influenced by cases and statutes including decisions in United States Copyright Office proceedings and transnational debates involving European Court of Human Rights concerns about cultural heritage access.

Organization and Operation

The community is organized with volunteer roles akin to editorial boards and moderation panels found in institutions like The New York Times Company, Encyclopaedia Britannica, Inc., and nonprofit governance models used by Wikimedia Foundation and Creative Commons. Coordination uses communication tools and workflow systems comparable to those employed by GitHub, SourceForge, and mailing lists reminiscent of early Listserv communities. The group maintains policies on text fidelity and editorial standards that mirror practices at Library of Congress, standards bodies such as ISO, and cataloging practices used by Dublin Core and metadata frameworks utilized by WorldCat and the OCLC network.

Workflow and Technology

Proofreading proceeds from scanned images produced by scanning partners including Internet Archive and private scanning services associated with universities like University of Michigan and corporations like Google. Volunteers use browser-based proofreading tools influenced by interface patterns from Mozilla Firefox, Opera (web browser), and text-editing paradigms from Emacs, Vim, and Notepad++. The workflow incorporates optical character recognition output from engines such as Tesseract OCR and integrates quality-control steps analogous to copy-editing workflows at publishers like Penguin Random House, HarperCollins, and Oxford University Press. Server infrastructure and software development practices follow patterns established by open-source projects hosted on platforms like GitHub and managed with version-control models similar to Subversion and Mercurial.

Projects and Contributions

Contributions have produced texts that appear in collections and digital libraries alongside works preserved by Project Gutenberg, Google Books, HathiTrust, Internet Archive, and cultural aggregators such as Europeana. The corpus includes literature by authors from the holdings of institutions like Public Library of Science repositories, historic documents related to events like the American Civil War, and classical works associated with libraries such as the British Museum and the Bibliothèque nationale de France. Collaborative projects have intersected with crowdsourcing initiatives exemplified by Zooniverse, transcription efforts like Transcribe Bentham, and scholarly digitization programs at Yale University and Princeton University.

Copyright and Licensing

The group concentrates on texts that are in the public domain under statutes and case law in jurisdictions such as the United States Copyright Act and rulings from bodies like the United States Supreme Court and the European Court of Justice. Licensing practices interact with frameworks such as Creative Commons and institutional open-access policies adopted by entities including Harvard University Press and MIT Press. The project’s approach to rights clearance parallels procedures used by archives like the National Archives and Records Administration and the rights assessment norms applied by library consortia such as the HathiTrust Research Center.

Reception and Impact

The initiative has been cited in discussions about mass digitization alongside programs led by Google Books, debates in forums including creativecommons.org and commentary from scholars at institutions like Oxford University and Columbia University. Its volunteers’ output has supported research in digital humanities projects at Stanford University and University of California, Berkeley and has been noted in coverage by media outlets such as The New York Times and The Guardian. The model has influenced crowdsourcing and text-reuse practices referenced by projects like Wikisource, Project Gutenberg Australia, and national library digitization strategies adopted by the National Library of Australia and the Bibliothèque nationale de France.

Category:Digital libraries Category:Volunteer organizations