CJK Unified Ideographs

CJK Unified Ideographs
Name	CJK Unified Ideographs
Block	Unicode
Range	Various (U+4E00–U+9FFF, extensions)
Script	Han
Languages	Chinese, Japanese, Korean, Vietnamese, Classical Chinese

Contents

Background and Scope
Historical Development and Unification Process
Block Structure and Encoding Details
Character Repertoire and Variants
Implementation and Fonts
Usage and Linguistic Considerations
Reception, Issues, and Ongoing Work

CJK Unified Ideographs are the set of Han logographic characters unified in the Unicode Standard to represent shared and historically related graphemes used across China, Japan, Korea, Vietnam, Singapore, Taiwan, Hong Kong, Macau and in scholarly texts associated with Ming dynasty, Qing dynasty, Heian period, Joseon dynasty and French colonial rule in Indochina. The repertoire underpins text interchange among projects such as the Unicode Consortium proposals, the ISO/IEC JTC 1/SC 2 encoding efforts, and national standards like GB 2312, JIS X 0208, KS X 1001 and VISCII adaptations. Implementations across platforms by Microsoft, Apple Inc., Google, IBM, Adobe Systems Incorporated and open-source communities support multinational publishing, scholarly editions, and digital archives for collections from the British Library, Library of Congress, National Palace Museum (Taiwan), and university presses.

Background and Scope

The block covers characters used in historical corpora and modern texts collected from standards including GB 18030, JIS X 0213, KS X 1002, and national proposals submitted to the Unicode Technical Committee. It aims to provide a single coded point for graphemes with shared identity across sources such as inscriptions found in the Oracle bone script corpus, rubbings from the Stele of Mount Shizhu, classical works like the Analects of Confucius and editions of the Tripitaka Koreana, as well as modern publications issued by printers like Shueisha and People's Publishing House. The scope spans base blocks in the Basic Multilingual Plane and multiple extension blocks driven by proposals from scholars affiliated with institutions such as Academia Sinica, Kyoto University, Seoul National University, and the Vietnamese Academy of Social Sciences.

Historical Development and Unification Process

The unification process traces to coordination among bodies including the International Organization for Standardization via ISO/IEC 10646, the Unicode Consortium, and national standards committees responding to character inventories from the Ming dynasty, Tang dynasty epigraphic studies, and modern lexicography projects like the Hanyu Da Zidian. Delegates from organizations such as China Electronics Standardization Institute, Japanese Standards Association, Korean Agency for Technology and Standards, and researchers from Peking University, University of Tokyo, Seoul National University contributed glyph evidence, source citations, and unification decisions. Contention over variant selection involved textual authorities like the Kangxi Dictionary, the Zhengzitong, the Shuowen Jiezi and regional typefaces used by foundries such as DynaComware and Monotype Imaging.

Block Structure and Encoding Details

Coded ranges include the Basic Multilingual Plane block and extension planes designated for subsequent additions, with code points allocated and annotated in documentation maintained by the Unicode Consortium and ISO/IEC JTC 1/SC 2. The standard defines normative mappings to legacy encodings including GBK, Shift JIS, EUC-JP, EUC-KR and compatibility with UTF-8 and UTF-16 transformation formats. Implementation guidance references works by standards committees like W3C for text rendering and by vendors such as Microsoft for font fallbacks; registry files and source references were submitted by entities including SBCL, Adobe Systems Incorporated, IBM, and national libraries.

Character Repertoire and Variants

The repertoire aggregates source glyphs and attested variants, with designated variant mechanisms such as Ideographic Variation Database collections driven by contributors from National Institute of Information and Communications Technology and research centers at Tsinghua University and Kyoto University. Distinctions among regional forms—used in texts from Mainland China, Taiwan, Hong Kong, Macau, Japan, South Korea and Vietnam—are documented so that type designers at foundries including DynaComware, Monotype Imaging, Pan-European Type Design and university presses can reference source attribution from the Kangxi Dictionary or modern corpora like the Chinese Text Project. Compatibility ideographs and merged code points were resolved through liaison with authorities such as the Unicode Technical Committee and national standards bodies.

Implementation and Fonts

Rendering depends on font technology and layout engines provided by corporations and projects like Microsoft ClearType, Apple Inc.'s Core Text, Google's Noto fonts, Adobe Systems Incorporated's Source Han Serif and Source Han Sans families, open-source projects such as SIL International initiatives, and typeface work by foundries like DynaComware and Monotype Imaging. Complex input and IME support involve systems such as Microsoft IME, Google Pinyin, SKK, and Wubi-based tools; shaping and fallback behavior interacts with rendering engines such as HarfBuzz and layout systems used by Mozilla Foundation and Chromium-based browsers.

Usage and Linguistic Considerations

Applications span digital editions of canonical texts like the Tao Te Ching, publications from People's Daily, academic corpora maintained by Academia Sinica and university projects at Harvard University and Oxford University Press. Linguistic decisions affect collation and sorting in standards bodies including ISO/IEC JTC 1/SC 2 and national libraries such as the National Library of China and the National Diet Library. Scholarly work in paleography and philology at institutions like Peking University, Kyoto University, Seoul National University, and museums including the National Palace Museum (Taiwan) rely on the unified set for interoperable digital scholarship.

Reception, Issues, and Ongoing Work

Debates persist among researchers from Academia Sinica, University of Tokyo, Seoul National University, Tsinghua University and standards committees over unification choices, rare-character coverage, and ongoing additions processed through the Unicode Consortium and ISO/IEC JTC 1/SC 2. Projects such as extension proposals, the Ideographic Variation Database, and font development by Adobe Systems Incorporated, Google, Microsoft and independent foundries continue to address variant representation, scholarly citation practices, and archival digitization needs for holdings in the British Library, Library of Congress, National Palace Museum (Taiwan) and national repositories. Category:Unicode blocks