LLMpediaThe first transparent, open encyclopedia generated by LLMs

ISO/IEC 10646

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Unicode Consortium Hop 4
Expansion Funnel Raw 53 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted53
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
ISO/IEC 10646
TitleISO/IEC 10646
StatusPublished
Year1993
OrganizationInternational Organization for Standardization; International Electrotechnical Commission
DomainCharacter encoding

ISO/IEC 10646 is an international standard that specifies a universal coded character set for use in information interchange, providing a repertoire and code points to represent characters from written languages and symbol systems. It defines a multi-plane encoding space, code charts, and principles for character naming and properties to enable interoperable text representation across computing platforms and communication systems. The standard is maintained jointly by international bodies and coordinated with major standards and organizations to support global linguistic diversity.

Overview

ISO/IEC 10646 defines a universal coded character set within a 31-bit space organized into planes, allowing assignment of character code points for scripts, symbols, and control codes. The standard interfaces conceptually with character models used by International Telecommunication Union, World Wide Web Consortium, Unicode Consortium, and national bodies such as British Standards Institution, American National Standards Institute, and Deutsches Institut für Normung. It provides normative code charts and annexes that guide implementers in mapping glyphs, handling combining marks, and specifying character properties used by systems like Microsoft Windows, Apple Inc., Linux, and major software libraries.

History and Development

Work on a universal coded character set began amid growth in multinational computing and publishing, influenced by work from International Organization for Standardization committees and experts from IBM, Sun Microsystems, and research institutions including University of Cambridge and Massachusetts Institute of Technology. Early milestones include coordination with proposals from Xerox, technical reports from European Computer Manufacturers Association, and efforts that paralleled character repertoire projects in Japan, China, and South Korea. Adoption of a multi-plane model responded to limitations observed in legacy encodings like ASCII, EBCDIC, and various national standards such as JIS X 0208 and GB 2312. Major revisions and the first published edition aligned with collaborative releases and joint editions that synchronized with the Unicode Standard core repertoire and stability policies.

Structure and Encoding Forms

The standard specifies a coded repertoire with code points organized into 17 planes, including the Basic Multilingual Plane and supplementary planes for historic scripts and symbols. It defines mapping and serialization mechanisms that correspond to encoding forms commonly implemented as UTF-8, UTF-16, and UTF-32 in environments maintained by Oracle Corporation, Google, and Mozilla Foundation. The character model distinguishes between spacing characters, combining marks, reserved control codes used by International Telecommunication Union standards, and surrogate mechanisms used in UTF-16 for interoperability with systems from Sun Microsystems and platform vendors. Code charts include normative code point assignments and character names derived through processes involving experts from institutions like Academia Sinica and Russian Academy of Sciences.

Relationship to Unicode

ISO/IEC 10646 is coordinated closely with the Unicode Consortium; both share an identical repertoire and code point assignments for canonical characters, enabling interchange between the international standard and the Unicode Standard editions. Collaborative processes involve liaison with bodies such as World Wide Web Consortium, European Committee for Standardization, and national committees to harmonize character properties, normalization, and collation behavior used in implementations by Microsoft, Apple Inc., and the Free Software Foundation. While Unicode provides extensive normative algorithms and data files for text processing (normalization, grapheme clusters, collation), the international standard focuses on the coded repertoire and code point semantics, allowing implementers to apply Unicode-defined algorithms when interoperating across platforms like Android (operating system), iOS, and server software from Apache Software Foundation.

Implementation and Adoption

Implementations of the standard appear in operating systems, programming languages, and protocols from vendors and projects including IBM, Oracle Corporation, Microsoft, Google, Mozilla Foundation, and open-source communities such as Debian and Fedora Project. Adoption in web technologies is traced through standards from World Wide Web Consortium and deployment in browsers like Google Chrome, Mozilla Firefox, and Safari (web browser). Databases and document formats from ISO, Adobe Inc., and standards for electronic interchange used by European Commission systems rely on the encoded repertoire to represent multilingual content, historic texts, and scientific notation, with fonts and rendering provided by vendors like Adobe Systems and open projects such as FreeType.

Maintenance and Revision Process

Maintenance is conducted by joint technical committees and working groups under International Organization for Standardization and International Electrotechnical Commission, with formal liaison arrangements involving the Unicode Consortium, national standards bodies including American National Standards Institute and British Standards Institution, and expert editorial panels drawn from academia and industry such as University of California, Berkeley and Google. Proposed additions and amendments undergo proposal submission, technical review, ballot, and publication cycles consistent with international standards procedures, incorporating evidence from script-encoding proposals submitted by scholars associated with institutions like School of Oriental and African Studies and Max Planck Institute for the Science of Human History. Editorial policies govern stable assignments, fallback handling, and deprecation, coordinated with Unicode's stability policies to maintain interoperability across platforms and long-term digital preservation initiatives led by organizations such as UNESCO and Library of Congress.

Category:Character encoding standards