Stanford Literary Lab

Stanford Literary Lab
Name	Stanford Literary Lab
Formation	2010
Founders	Matthew Jockers, Ted Underwood, Frédéric Mitterrand
Type	Research group
Location	Stanford University
Fields	Digital humanities, Literary studies, Computational linguistics

Contents

Stanford Literary Lab The Stanford Literary Lab is an experimental research group at Stanford University that applies computational methods to the study of literature. Founded by scholars with backgrounds at institutions such as Syracuse University, University of Illinois Urbana–Champaign, and University of Chicago, the Lab brought together expertise linked to projects like Google Books, Project Gutenberg, HathiTrust, JSTOR, and networks such as the Modern Language Association. Its work connects to debates involving figures and texts from William Shakespeare to James Joyce, and engages with archives like the Library of Congress, the British Library, and the Bodleian Library.

History

The Lab emerged in the context of initiatives including Google Books Ngram Viewer, the Humanities Research Institute movement, and early collaborations between scholars associated with Stanford University Libraries and centers such as the Center for History and New Media and the Institute for Advanced Study. Founders had previously worked on datasets tied to collections curated by Project Gutenberg, HathiTrust Digital Library, and bibliographies connected to the Modern Language Association. Early seminars invoked methodological precedents in work by scholars associated with University of Illinois Urbana–Champaign, University of Pennsylvania, Yale University, Harvard University, and Columbia University, and referenced corpora used in projects at the British Library, Library of Congress, and National Library of France. The Lab’s timeline intersects with funding and partnerships involving institutions such as the Andrew W. Mellon Foundation, the National Endowment for the Humanities, and collaborations with platforms like JSTOR.

The Lab combines techniques from computational linguistics, corpus linguistics, statistical analysis, and network methods developed in work at MIT, Carnegie Mellon University, University of Cambridge, Oxford University, and Princeton University. It employs tools and libraries related to Python (programming language), R (programming language), Natural Language Toolkit, scikit-learn, and infrastructures that integrate with datasets from Google Books, HathiTrust Digital Library, Project Gutenberg, and metadata standards used by the Library of Congress and the Europeana initiative. Methods include topic modeling derived from research on Latent Dirichlet Allocation, stylometric techniques inspired by studies of authorship attribution involving texts like The Federalist Papers, and network visualization akin to approaches used for mapping correspondence archives such as the papers of Charles Darwin or Thomas Jefferson. Analyses draw on statistical frameworks developed at centers such as University College London and Stanford University’s own departments.

The Lab’s projects addressed questions about the rise and fall of genres, the shape of affective vocabulary, forms of authorship, and the dynamics of literary markets. Studies traced genre shifts visible in corpora spanning authors from Jane Austen to Charles Dickens and from Edgar Allan Poe to Mark Twain, comparing patterns with metadata from the British Library and the Library of Congress. Work on sentiment and affect built on methods applied to texts by Emily Dickinson, Walt Whitman, Virginia Woolf, T.S. Eliot, and Langston Hughes, and compared period trends with datasets linked to the Google Books Ngram Viewer and newspaper archives like The Times (London), The New York Times, and the Chicago Tribune. Findings about stylistic change resonated with earlier stylometric studies of disputed attributions such as those concerning William Shakespeare and anonymous political pamphlets like The Federalist Papers. Network analyses of influence and reception employed models similar to those used for the correspondence of Samuel Pepys and the intellectual networks around Immanuel Kant and G. W. F. Hegel.

The Lab published working papers, pamphlets, and peer-reviewed articles that appeared alongside scholarship from journals and presses connected to Oxford University Press, Cambridge University Press, Princeton University Press, and periodicals where debates about methods occur such as PMLA, Digital Scholarship in the Humanities, and Computational Linguistics. Its outputs influenced curricular offerings at Stanford University, Columbia University, and University of California, Berkeley, and informed digital projects at institutions like the British Library, HathiTrust, Project Gutenberg, and library initiatives funded by the Andrew W. Mellon Foundation. The Lab’s methods were cited in interdisciplinary work combining analyses of texts by Herman Melville, F. Scott Fitzgerald, Ernest Hemingway, Simone de Beauvoir, and Gabriel García Márquez with datasets curated by archives such as the National Library of Scotland and the Bibliothèque nationale de France.

Commentary from scholars at venues including Princeton University, Yale University, University of Chicago, Harvard University, and University of Pennsylvania raised methodological and interpretive critiques about corpus selection, representativeness, and the limits of algorithmic readings—debates echoed in forums involving the Modern Language Association and symposia at the American Comparative Literature Association and the Digital Humanities Conference. Critics compared concerns to earlier methodological disputes in studies related to Quantitative History and contested interpretations seen in controversies about datasets like Google Books and archives of newspapers such as The Guardian and Le Monde. Defenders pointed to the Lab’s transparency practices, reproducibility efforts modeled after standards at Stanford University Libraries and the Center for Open Science, and engagements with peer groups at institutions including Carnegie Mellon University and University College London.