Common Locale Data Repository

Common Locale Data Repository
Name	Common Locale Data Repository
Developer	Unicode Consortium
Released	2001
Latest release	CLDR 41
Programming language	XML
Platform	Cross-platform
License	Unicode License

Contents

Overview
Data Content and Structure
Implementation and Usage
Governance and Maintenance
History and Development

Common Locale Data Repository

The Common Locale Data Repository is an international Unicode Consortium project that provides key locale data for software internationalization and localization. It supplies standardized information used by implementations such as ICU, Java, Microsoft, Apple, Google, and Mozilla to support regional formats for dates, numbers, currencies, and languages. By consolidating data contributed by national bodies, standards organizations, and commercial vendors—including the IETF, ISO 15924, ISO 3166, and the Library of Congress—the repository reduces duplication and inconsistency across platforms.

Overview

CLDR is a machine-readable dataset distributed by the Unicode Consortium that codifies locale-specific conventions for use in internationalized software. Major consumers include ICU, OpenJDK, Chromium, Microsoft Windows, macOS, and Android. The project converges input from standards sources such as ISO 639, ISO 15924, ISO 3166-1 alpha-2, and policy bodies like the IETF's BCP 47 language tag work. Contributors include national libraries, standards institutes like DIN, BSI, AFNOR, corporations such as Google, Apple, and open-source communities like Mozilla.

Data Content and Structure

CLDR organizes data in an XML format structured into locales (language, region, script) that mirror identifiers in BCP 47. Core content types include calendar data (Gregorian, Buddhist, Islamic), date/time patterns, number formats, currency symbols and plural rules, and territory containment. Specific datasets cover localized names for languages (linked to ISO 639), scripts (linked to ISO 15924), territories (linked to ISO 3166), and time zones (tied to the IANA time zone database). The repository also encodes transliteration rules comparable to those curated by institutions like the Library of Congress and the UNGEGN.

The hierarchical XML uses inheritance from parent locales (for example, a language inherits from a base language then a script then a region), enabling data reuse across locales such as those for English, Spanish, Chinese, Arabic, and Hindi. CLDR includes exemplar character sets for script coverage and collation rules influenced by standards like Unicode Collation Algorithm and stakeholder specifications from firms such as Collation Inc. and projects like ICU.

Implementation and Usage

Software projects integrate CLDR via data releases and toolkits. Implementations use CLDR to localize user interfaces in products by Microsoft (Windows), Google (Chrome, Android), Apple (iOS, macOS), and open-source platforms such as Mozilla (Firefox) and LibreOffice. Runtime libraries like ICU and languages such as Java, Python (via Babel), Ruby, and JavaScript engines consume CLDR to render dates, numbers, currencies, pluralization, and list formatting.

The project provides a CLDR Toolkit and survey tool used by locale experts from organizations such as the European Commission, national standards bodies like NIST, and cultural institutions including the British Library, enabling collaborative editing and voting workflows. CLDR also underpins internationalization in standards work by W3C and interoperability in systems used by W3C members.

Governance and Maintenance

The Unicode Consortium oversees CLDR with a technical committee and maintainers drawn from stakeholder organizations including Google, Apple, Microsoft, IBM, Red Hat, and community representatives from Mozilla. Policy and release coordination align with Unicode releases and with related standards such as ISO editions and IETF guidance. Contributions are vetted through a survey process and issue tracking handled in collaboration with projects like ICU and repositories maintained by GitHub.

CLDR licensing follows the Unicode License to encourage broad adoption while protecting contributor rights. Governance includes periodic release cycles, quality assurance from national bodies (for example, submissions from Repositorio Nacional-style institutions), and liaison with standards groups including UNGEGN and IETF to accommodate geopolitical and linguistic sensitivities.

History and Development

Initiated in the early 2000s by the Unicode Consortium and stakeholders such as IBM and Sun Microsystems, CLDR emerged to address divergent locale implementations across platforms like Unix, Windows NT, and early web browsers including Netscape Navigator and Internet Explorer. Early contributors included Microsoft, Apple, Oracle (through OpenJDK), and regional institutions such as the NIST and the Bibliothèque nationale de France.

Over successive releases CLDR expanded from basic date and number formats to cover pluralization, collation, and modern needs like locale-specific names for currencies introduced by institutions such as the International Monetary Fund and geopolitical updates aligned with United Nations reports. Integration milestones include adoption by ICU and inclusion in OpenJDK and Chromium codebases, enabling consistent globalization across desktop, server, and mobile ecosystems. Recent work focuses on improved support for complex scripts (for example, Devanagari, Arabic script, Han characters), enhanced transliteration, and better tooling for community-driven locale contribution from entities like the European Commission and national archives.

Category:Unicode Consortium