LLMpediaThe first transparent, open encyclopedia generated by LLMs

Basic Multilingual Plane

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Unicode Hop 4
Expansion Funnel Raw 110 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted110
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Basic Multilingual Plane
NameBasic Multilingual Plane
AbbreviationBMP
RangeU+0000..U+FFFF
ScriptsNumerous (see article)
First definedUnicode 1.0 (1991)
PurposePrimary plane for Unicode characters

Basic Multilingual Plane

The Basic Multilingual Plane is the primary plane of the Unicode character space, covering code points from U+0000 to U+FFFF and used by standards such as ISO/IEC 10646, Unicode Consortium, W3C, ECMA International to encode many historic and modern scripts; it is referenced in implementations from Apple Inc. and Microsoft to Google and Mozilla and affects technologies like HTML, XML, Java, Python, Perl and SQL Server.

Overview

The plane provides one of the foundational allocations in Unicode Standard and ISO/IEC 10646 that underpins text processing in environments from Linux and Windows NT to macOS and Android and is integral to protocols used by IETF, World Wide Web Consortium, IANA and platforms such as GitHub, Stack Overflow, Facebook, Twitter (now X); implementations in toolchains like GCC, LLVM and libraries such as ICU, libxml2, glibc rely on BMP semantics when handling encodings for applications from Microsoft Office and LibreOffice to Adobe Photoshop and InDesign.

Range and Code Point Allocation

The BMP occupies code points U+0000..U+FFFF as defined by Unicode Standard and coordinated with ISO/IEC 10646; this allocation contains control areas like ASCII compatibility ranges used by RFC 8259 JSON processing, legacy blocks used by ISO 8859-1 and Windows-1252 migration, and reserved zones managed by the Unicode Consortium and IANA for private use and surrogate halves relevant to UTF-16 implementations such as those in Java Virtual Machine and Microsoft .NET Framework.

Character Blocks and Notable Scripts

The BMP includes many script blocks and orthographies such as Latin script ranges used by William Shakespeare texts and European Union documents, Greek alphabet ranges relevant to Homer and Euclid, Cyrillic script ranges linked to Leo Tolstoy and Peter the Great, Hebrew alphabet used in texts like the Dead Sea Scrolls editions, Arabic script used in works by Ibn Sina and Naguib Mahfouz, Devanagari linked to Kalidasa and Mahatma Gandhi, Han characters used in corpora concerning Confucius and Sun Tzu, as well as specialized blocks like Mathematical Operators used in publications by Isaac Newton and Alan Turing, Emoji beginnings adopted by platforms including Apple Inc. and Google, and historic scripts such as Runic alphabet, Ogham, Linear B, Egyptian hieroglyphs (partial), supporting digitization projects by institutions like the British Library and Library of Congress.

Encoding and UTF-16 Surrogates

While UTF-8 represents BMP code points with one to three bytes as implemented in Apache HTTP Server and Nginx, UTF-16 uses single 16-bit code units for BMP and pairs surrogate code units for supplementary planes, a behavior central to runtimes like the Java Platform, Standard Edition and Microsoft .NET; surrogate handling influences APIs in POSIX systems, string libraries in ECMAScript engines such as V8 and SpiderMonkey, and affects interoperability with standards like RFC 3629 and protocols used by SMTP, IMAP, POP3.

Usage and Compatibility Considerations

Software internationalization and localization efforts by organizations including United Nations agencies, European Commission, UNESCO and corporations such as IBM, Oracle Corporation, SAP SE must consider BMP coverage when supporting languages in Microsoft Office, Google Workspace, Atlassian products, content management systems like WordPress, Drupal, and messaging platforms like WhatsApp and Telegram; font vendors such as Monotype Imaging, Adobe Fonts and Google Fonts map glyphs to BMP code points, while operating systems manage fallback and rendering via libraries like HarfBuzz, FreeType, DirectWrite and Core Text.

History and Standardization

The BMP concept originated with the early versions of the Unicode Standard and was formalized alongside ISO/IEC 10646 during meetings involving contributors from Xerox, Apple Inc., Microsoft, Sun Microsystems and standards bodies such as IEC, ISO and the Unicode Consortium; milestones include Unicode 1.0 (1991), later harmonization with ISO/IEC 10646 editions, emoji additions driven by proposals from companies like Google and Apple Inc., and ongoing governance through committees and working groups coordinated with organizations such as W3C and IETF.

Category:Unicode