CLDR — LLMpedia

CLDR
Name	CLDR
Developer	Unicode Consortium
Released	2000s
Latest release	ongoing
Programming language	Java, XML
License	Unicode licensing
Website	unicode.org/cldr

Contents

Overview
History and development
Architecture and data model
Locale data types and coverage
Tools and usage
Governance and maintenance

CLDR The Common Locale Data Repository is a project of the Unicode Consortium that provides locale-specific data for internationalization and localization. It supplies standardized information such as date and time formats, number systems, currency symbols, collation rules, and pluralization across many languages and regions. CLDR is used by a wide range of software projects and platforms including Google, Microsoft, Apple Inc., IBM, Oracle Corporation and Mozilla Foundation to ensure consistent locale behavior across applications and services.

Overview

CLDR collects, organizes, and distributes locale data to support Unicode Standard conformance and interoperable localization. The repository delivers machine-readable resources consumed by libraries such as ICU (International Components for Unicode), glibc, libc, and language runtimes like Java (programming language), Python (programming language), and JavaScript engines. Major operating systems and platforms—Android (operating system), iOS, Windows, macOS—and web standards implemented by W3C and WHATWG rely on CLDR for canonical formatting. CLDR interfaces with standards and organizations including ISO 4217, ISO 639, ISO 3166, and IETF language tags.

History and development

CLDR originated within the Unicode Consortium to address inconsistent locale data across vendors. Early efforts aligned with the evolution of the Unicode Standard and the growth of global software markets driven by corporations such as IBM and Sun Microsystems. Contributions and editorial processes scaled during the 2000s as projects like Mozilla Firefox, GNOME, KDE, and Apache HTTP Server integrated CLDR-provided rules. High-profile events—such as the internationalization needs highlighted by Google Translate and multinational deployments by Microsoft Windows XP and later versions—helped expand coverage. The project has continually evolved through release milestones, community proposals from organizations including Red Hat and Facebook, Inc., and collaboration with standards bodies like IANA and ECMA International.

Architecture and data model

CLDR’s architecture centers on XML-based data files organized by locale identifiers derived from IETF language tag conventions. The repository defines a hierarchical inheritance model where data flows from base locales (e.g., en) to region-specific locales (e.g., en-GB), and to variant locales used by projects like OpenOffice and LibreOffice. Core types include supplemental and main data sets; supplemental files contain information such as currency meta zones tied to ISO 4217 codes, measurement systems aligned with SI (International System of Units) usage, and collation tailoring linked to Dewey Decimal Classification—while main data files encode locale-specific elements such as month names, day names, and numbering systems. CLDR also specifies a coverage level framework used by implementers like Google Chrome and Mozilla Firefox to prioritize locale support. Tools consume CLDR XML and produce binary or optimized representations for libraries like ICU and platform bundles used by Android (operating system).

Locale data types and coverage

CLDR contains diverse locale data types: date/time formats, number formats, currency formats, plural rules, unit display names, translations of region and language names, collation sequences, exemplar characters, and transliteration mappings. Coverage spans hundreds of locales, including major languages such as English, Spanish, Chinese (Simplified), Hindi, Arabic and regional forms like Brazil, United Kingdom, Mexico, Saudi Arabia, South Africa. Specialized data supports scripts such as Devanagari script, Cyrillic script, Arabic script, and Han characters. CLDR also records ethnonyms, territory names tied to ISO 3166-1 alpha-2 codes, and scripts aligned with ISO 15924. Coverage evolves through contributions from localizers in communities around projects such as Transifex and Weblate.

Tools and usage

A suite of tools accompanies CLDR, including the online Survey Tool used by contributors, the command-line LDML utilities, and codegen that integrates with libraries like ICU. Projects such as Mozilla Firefox, Chromium, Android (operating system), and server stacks like Tomcat and NGINX rely on CLDR-derived data. Developers use CLDR-aware APIs in ICU4J, ICU4C, and language libraries for JavaScript internationalization (e.g., ECMAScript Internationalization API) to format numbers, dates, and plural-sensitive messages. Localization platforms including Crowdin and Transifex may interface with CLDR data to align translations. The Survey Tool coordinates contributions from vendors, researchers, and individuals, while build tools generate optimized locale bundles for deployment in environments from embedded systems to cloud services provided by Amazon Web Services and Microsoft Azure.

Governance and maintenance

CLDR is governed by the editorial processes of the Unicode Consortium, with stewardship by a CLDR Technical Committee and community participation via the Survey Tool. Contributors include corporations such as Google, Microsoft, IBM, Apple Inc., and nonprofit projects like Mozilla Foundation and Wikimedia Foundation. Decision-making balances technical proposals, evidence from locale experts, and coordination with standards bodies such as ISO, IANA, and IETF. Release cycles are managed through public issue trackers and mailing lists hosted by the Unicode organization, and maintenance incorporates feedback from implementers including ICU, glibc, and major platform vendors. The CLDR project promotes transparency and openness while aligning with broader internationalization efforts across the software ecosystem.

Category:Unicode Consortium projects