ISO/IEC 8859 — LLMpedia

ISO/IEC 8859
Name	ISO/IEC 8859
Standard	ISO/IEC 8859
Status	Published
Classification	Extended ASCII, Character encoding
Related	ISO/IEC 646, Unicode
Year	1987 (initial parts)

Contents

Overview
Parts and character sets
Technical details
Relationship to other standards
Usage and adoption

ISO/IEC 8859. It is a joint standard from the International Organization for Standardization and the International Electrotechnical Commission that defines a series of 8-bit character encodings for Latin-based and other scripts. Developed primarily during the 1980s and 1990s, it was a cornerstone for personal computing and early internet communication before the widespread adoption of Unicode. The standard is divided into multiple parts, each covering a specific group of languages and geographical regions.

Overview

The development of this standard was driven by the limitations of the earlier 7-bit ASCII encoding, which could not accommodate the accented characters needed for many European languages. Engineers at organizations like ECMA and ANSI collaborated to create compatible extensions. It became a crucial bridge between the English-centric computing world and the needs of locales from Western Europe to the Middle East. Its structure allowed for the representation of 191 printable characters within a single-byte framework, leaving the first 128 code points identical to ASCII.

Parts and character sets

The standard is organized into fifteen separately published parts, each designated by a number following a hyphen. Key parts include ISO/IEC 8859-1, covering most Western European languages like French and Spanish, and ISO/IEC 8859-2 for Central European languages such as Polish and Czech. Other significant parts are ISO/IEC 8859-5 for Cyrillic alphabets, ISO/IEC 8859-6 for Arabic, and ISO/IEC 8859-7 for modern Greek. Later additions included ISO/IEC 8859-9 for Turkish and ISO/IEC 8859-15, which added the Euro sign to the Latin-1 repertoire.

Technical details

Each part reserves code points 0 through 127 for standard ASCII and uses positions 160 through 255 for additional letters, punctuation, and symbols specific to its target languages. The positions from 128 to 159 are typically left undefined or contain rarely used control characters, a design choice that sometimes caused compatibility issues. The encoding is a fixed-width, single-byte scheme, making it simple to process but incapable of representing large character sets like those of East Asian languages. This limitation was a primary technical motivation for the development of Unicode.

Relationship to other standards

Several parts of this standard were derived from or aligned with earlier national and regional efforts, such as ECMA-94 from the European Computer Manufacturers Association. ISO/IEC 8859-1 is historically identical to the first 256 code points of ISO/IEC 10646 and was the basis for the Windows-1252 encoding used in Microsoft Windows. It also served as the default character set for early HTTP protocols and HTML documents, as specified by the W3C. Its influence is seen in other proprietary encodings like CP437 from IBM and Mac OS Roman from Apple.

Usage and adoption

Throughout the 1990s, these encodings were ubiquitous in operating systems, word processors, and email clients across regions like Scandinavia, the Baltic states, and North Africa. ISO/IEC 8859-1, in particular, was declared the default character set for the web by HTML 4.01. However, the rise of globalized software and the internet exposed the problems of needing multiple, incompatible encodings, leading to the "UTF-8 Everywhere" movement. Major platforms, including Linux, Apache, and MySQL, eventually transitioned to Unicode, rendering most parts of this standard largely obsolete for new systems, though legacy data persists.

Category:Character sets Category:ISO/IEC standards Category:Computing standards