LLMpediaThe first transparent, open encyclopedia generated by LLMs

Korean Historical Corpus

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: ketsu-go Hop 4
Expansion Funnel Raw 71 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted71
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Korean Historical Corpus
NameKorean Historical Corpus
TypeLinguistic corpus
CountryKorea
Established20th century
LanguageKorean (Middle Korean, Early Modern Korean, Modern Korean)
Formatstext, annotated corpora, searchable databases

Korean Historical Corpus The Korean Historical Corpus is a curated assembly of historical Korean texts used for linguistic, philological, and cultural research. It brings together sources from dynastic archives, literary works, legal codes, and private writings to support studies in historical linguistics, lexicography, and digital humanities. The corpus intersects with scholarship on Sejong the Great, King Sejo of Joseon, Yi Sun-sin, King Gojong, and institutions such as the Academy of Korean Studies and the National Institute of Korean Language.

Introduction

The project situates primary sources from periods including the Goryeo Dynasty, the Joseon Dynasty, and the Korean Empire alongside modern editorial resources like publications from the Academy of Korean Studies and the National Museum of Korea. It connects manuscript traditions preserved in collections such as the Jikji, the Hunminjeongeum Haerye, the Annals of the Joseon Dynasty, and private collections related to figures like Yi I and Yi Hwang. Researchers often cross-reference materials with holdings at the National Library of Korea, the Seoul National University Library, and international repositories including the British Library and the Library of Congress.

Scope and Contents

The corpus includes texts spanning genres: royal verdicts and imperial edicts from the Joseon Dynasty; Confucian treatises by Kim Jeong-hui and Jeong Yak-yong; Buddhist sutras associated with Seon masters such as Jajang; Neo-Confucian commentaries linked to Song Si-yeol; legal compilations like the Gyeongguk Daejeon; medical texts like those influenced by Heo Jun; travelogues by Yun Seon-do and Jeong Yak-yong; and modern newspapers from the Korean Empire era such as The Independent (Dongnip Sinmun). It also catalogs phonological records related to Hunminjeongeum, orthographic variants in texts by Choe Sejin, and transliterations in documents tied to the Treaty of Ganghwa and diplomatic exchanges with Japan–Korea Treaty of 1876 and the Treaty of Portsmouth.

Compilation and Methodology

Compilers draw on manuscript collation methods used in projects at the Academy of Korean Studies, textual criticism practiced around editions of the Annals of the Joseon Dynasty, and digitization workflows from the National Hangeul Museum. Provenance work references archival practices at the National Archives of Korea and cataloging standards influenced by the International Council on Archives and the Library of Congress. Editions reconcile variant readings found in collections such as the Seonggyungwan archives, private family registries like the Jokbo repositories, and lithographed periodicals held by the National Library of Korea. Metadata schemas align with initiatives by the Digital Humanities Institute at Yonsei University and collaborative platforms involving Korea University.

Linguistic Features and Annotation

Annotations address phonology from Middle Korean sources, morphemic segmentation informed by studies on Hunminjeongeum Haerye, and syntactic description following frameworks validated in work by scholars from the Korean Language and Literature Association and the National Institute of Korean Language. Tagsets document honorifics found in texts connected to figures like Sejong the Great and Queen Seondeok and register distinctions seen in letters by Heo Gyun and Kang Youwei correspondences. Lexical tagging incorporates entries from historical dictionaries such as those inspired by Choe Sejin and modern compilations by the Institute of the National Language. Phonetic reconstructions reference research connected to Kang Pan-sok and studies comparing dialectal material from regions like Gyeongsang Province, Jeolla Province, and Gangwon Province.

Access and Formats

Access is provided via searchable databases, XML-encoded corpora, and downloadable datasets maintained by institutions including the Academy of Korean Studies, the National Institute of Korean Language, and university labs at Seoul National University and Yonsei University. Users may encounter TEI-XML editions, relational databases modeled after Text Encoding Initiative practices, and APIs analogous to services from the National Library of Korea and international partners such as the British Library’s digitization program. Formats support concordancing tools used in projects at Korea University and networked visualizations developed by the Digital Humanities Center at Hanyang University.

Research Applications and Impact

Scholars employ the corpus for diachronic studies of honorific systems evident in writings by Yi Hwang and Yi I, historical sociolinguistics tracing language change across the Goryeo Dynasty and Joseon Dynasty, and computational linguistics experiments at institutions like KAIST and POSTECH. Literary critics analyze stylistic developments connected to Hwang Jin-i and Kim Man-jung; historians cross-reference the corpus with diplomatic records involving Emperor Meiji and Korean Empire envoys; and lexicographers update historical entries in projects associated with the National Institute of Korean Language. The corpus supports pedagogy in departments at Sejong University, Ewha Womans University, and Chung-Ang University.

Criticisms and Limitations

Critiques highlight uneven coverage of regional dialects such as varieties from Jeju Island and gaps in representation for minority voices including documents related to Koryo-saram and early Christianity in Korea records tied to missionaries like Robert Jermain Thomas. Scholars note editorial bias in editions produced under certain patrons linked to the Joseon court and question metadata consistency compared to standards advocated by the International Council on Archives and the Text Encoding Initiative. Technical limitations include OCR errors in Hanja materials similar to challenges reported by the British Library and difficulties reconciling chronologies across disparate collections like the Annals of the Joseon Dynasty and private Jokbo genealogies.

Category:Linguistic corpora