Glottolog — LLMpedia

Glottolog
Name	Glottolog
Type	Bibliographic database
Owner	Max Planck Institute for the Science of Human History
Launched	2012
Current status	Active

Contents

Overview
History and development
Classification and database structure
Coverage and criteria
Usage and applications
Reception and criticism

Glottolog is a comprehensive bibliographic database and catalogue of the world's languages and dialects, emphasizing genealogical classification and bibliographic citations. It is maintained by an academic research group and aggregates bibliographic metadata, language family trees, and unique identifiers used by linguists, fieldworkers, and institutions. The resource is often cited alongside other language resources and is integrated into digital infrastructures for language description, archiving, and computational linguistics.

Overview

Glottolog functions as a curated bibliographic repository linking language names to bibliographic entries, family trees, and identifiers used by institutions such as the Max Planck Institute for Evolutionary Anthropology, the Max Planck Institute for Psycholinguistics, the Max Planck Institute for Human Cognitive and Brain Sciences, and projects at the University of Leipzig. It complements resources like Ethnologue, ISO 639-3, OLAC, SIL International, UNESCO, and archives including the Endangered Languages Archive, the Pacific and Regional Archive for Digital Sources in Endangered Cultures, and the Archive of the Indigenous Languages of Latin America. Practitioners from the School of Oriental and African Studies, the University of California, Berkeley, the University of Oxford, the University of Cambridge, and the Australian National University use it alongside corpora such as the Universal Dependencies, the Open Multilingual WordNet, and the Parallel Bible Corpus. Influential linguists associated with the ecosystem include R. M. W. Dixon, Kenneth L. Hale, Noam Chomsky, William Labov, and Joseph Greenberg.

History and development

The project originated in efforts by researchers affiliated with the Max Planck Institute for Evolutionary Anthropology and the Max Planck Digital Library and evolved through collaborations with institutions like the University of Leipzig and the University of Copenhagen. Early stages intersected with initiatives at SIL International, the Summer Institute of Linguistics, and the Linguistic Society of America. Development milestones were influenced by standards from ISO 639-3 and bibliographic practices seen at the Library of Congress, the British Library, and the Deutsche Nationalbibliothek. Scholarly contributions and critiques came from scholars at the School for Advanced Study, the Australian National University, the University of Toronto, and the University of Hawaiʻi at Mānoa. Funding and support involved organizations such as the European Research Council, the National Science Foundation, and national research councils like the Deutsche Forschungsgemeinschaft.

Classification and database structure

Glottolog’s taxonomy aims to represent genealogical relationships among languages, drawing on classifications proposed by scholars including Joseph Greenberg, Bernard Comrie, Murray Gell-Mann (in typological debates), Edward Sapir, and Leonard Bloomfield. The database uses identifiers and metadata interoperable with ISO 639-3, OLAC, and bibliographic databases at institutions like the British Library and the Library of Congress. Its structure includes entries for language varieties, dialects, and families comparable to taxonomies produced by the World Atlas of Language Structures, the Autotyp database, and family treatments in works published by Cambridge University Press, Oxford University Press, and Routledge. The schema supports citations to journals such as Language, Oceanic Linguistics, International Journal of American Linguistics, and monographs from presses including De Gruyter, John Benjamins, and MIT Press.

Coverage and criteria

Coverage emphasizes attested languages and varieties documented in the scholarly literature, with criteria influenced by practices at the International Phonetic Association, standards like ISO 639-3, and catalogues maintained by organizations such as SIL International and the Endangered Languages Project. The database documents sources ranging from field reports by researchers at the School of Oriental and African Studies and the Australian National University to descriptive grammars published by Cambridge University Press and Oxford University Press, and archival collections at institutions like the British Library and the American Philosophical Society. It cross-references to regional resources such as the Pacific Manuscripts Bureau, the Australian Institute of Aboriginal and Torres Strait Islander Studies, and the Sakipunan ng mga Manunulat (as an example of regional scholarly networks).

Usage and applications

Researchers at the Max Planck Institute for the Science of Human History, computational linguists at the University of Edinburgh, corpus builders at the University of Pennsylvania, and archivists at the School of Oriental and African Studies use the database for language identification, bibliographic linkage, and phylogenetic studies. It supports workflows in projects like the World Loanword Database, the Glottobank project, PHOIBLE, and the Intercontinental Dictionary Series, and it is integrated into tools developed at institutions such as the Center for Open Science, the Harvard Dataverse, and the Open Science Framework. Language technology developers at companies like Google and research groups at Microsoft Research and Facebook AI Research reference it for dataset documentation and language tagging. Field linguists associated with University of California, Los Angeles, Leiden University, and McGill University use it to connect field notes and corpora to stable identifiers.

Reception and criticism

The resource has been praised by scholars at the Linguistic Society of America, the Royal Society, and editorial boards of journals like Language and Diachronica for its rigorous bibliographic focus, but it has also attracted critique from entities including SIL International, some staff at Ethnologue, and scholars at the University of Vienna regarding classification choices and contrasts with ISO 639-3 codes. Debates involve methodological positions associated with scholars like Joseph Greenberg and Lyle Campbell, and institutions such as the Max Planck Society and the British Academy have engaged in discussions about standards and interoperability. Commentators from the Open Knowledge Foundation and the Digital Humanities community have highlighted strengths in openness while noting challenges in reconciling competing taxonomies from publishers like Cambridge University Press and Oxford University Press and databases maintained by SIL International and national libraries.

Category:Linguistic databases