LLMpediaThe first transparent, open encyclopedia generated by LLMs

Unicode Common Locale Data Repository

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 82 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted82
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Unicode Common Locale Data Repository
NameUnicode Common Locale Data Repository
AuthorUnicode Consortium
DeveloperUnicode Consortium
Released2001
Latest release versionCLDR 42
Programming languageXML
GenreInternationalization, localization

Unicode Common Locale Data Repository

The Unicode Common Locale Data Repository is a standards resource maintained by the Unicode Consortium that provides locale-specific data for software internationalization and localization. It supplies machine-readable information used by projects such as ICU, Microsoft Windows, Apple Inc., Google, and IBM to format dates, numbers, currencies, time zones, and collation rules across languages and regions. Major users include Mozilla Foundation, Red Hat, Apache HTTP Server, Oracle Corporation, and SAP SE, enabling interoperability among systems like Android (operating system), iOS, Windows 10, macOS, and Linux distributions.

Overview

CLDR originated within the Unicode Consortium to centralize locale data previously maintained separately by organizations such as IETF, ISO, ECMA International, W3C, and Unicode Technical Committee. The project intersects with standards and registries like ISO 3166, ISO 15924, ISO 4217, IANA Time Zone Database, and IETF BCP 47, providing mappings and supplemental data used by platforms including Microsoft Office, LibreOffice, Eclipse Foundation, JetBrains, SAP SE, and Salesforce. Governance and contribution workflows draw on models related to Apache Software Foundation, Linux Foundation, and W3C working groups.

Data and Formats

CLDR stores data in XML with schemas informed by the Unicode Standard and coordinated with resources such as Unicode Locale Data Markup Language and the Common Locale Data Repository project. Data types include locale identifiers, plural rules, date and time patterns, numbering systems, currency symbols, exemplar characters, and collation tailorings. Implementations parse CLDR XML or use derived formats for libraries like ICU (International Components for Unicode), GNU C Library, glibc, and Qt (software framework). Mapping between CLDR and external registries references IANA, ISO 4217, ISO 15924, BCP 47, and the IANA Language Subtag Registry. Tools and utilities for CLDR include the CLDR Survey Tool, release pipelines similar to those at Apache Maven, and build systems used by Debian, Fedora Project, and Arch Linux for packaging locale bundles.

Maintenance and Governance

The Unicode Consortium oversees CLDR development through technical committees and community contribution mechanisms influenced by models from IETF, W3C, ECMA International, and ISO. Expert and full members, including Apple Inc., Google, Microsoft Corporation, Adobe Systems, IBM, and Amazon (company), participate in review and ballot processes. The CLDR release cadence and issue tracking mirror practices at GitHub and Bugzilla, while collaboration often involves stakeholders such as Mozilla Foundation, Red Hat, Oracle Corporation, SAP SE, and regional standards bodies like GB/T, DIN, and BSI Group. Localization contributors include language academies and institutions such as the Académie française, Real Academia Española, Deutsches Institut für Normung, and the People's Republic of China State Language Commission.

Adoption and Implementations

Adoption spans operating systems, application frameworks, and cloud providers. Key adopters include Apple Inc. for iOS and macOS, Google for Android (operating system) and Chrome, Microsoft for Windows 10 and Microsoft Office, and IBM for middleware in WebSphere. Open-source implementations rely on ICU, glibc, glib, libc++, Qt (software framework), GTK, and Mozilla Firefox for rendering and input. Databases and platforms integrating CLDR-derived behavior include PostgreSQL, MySQL, MongoDB, Elasticsearch, Apache Solr, Apache Hadoop, and Kubernetes. Enterprise software vendors such as SAP SE, Salesforce, Oracle Corporation, and Microsoft Dynamics use CLDR data for locale-aware features. Cloud providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure surface localized services by leveraging CLDR-based libraries.

Criticisms and Limitations

Critics highlight issues of completeness, regional representation, and update latency compared with fast-moving registries like IANA Time Zone Database and standards such as ISO 4217. Smaller language communities and institutions—examples include the Académie des langues minoritaires, regional bodies, and NGOs—sometimes find CLDR lacks granular conventions or rapid inclusion compared to community-driven projects like Mozilla Localizations or language-specific initiatives. Tensions arise between commercial contributors (e.g., Apple Inc., Google, Microsoft Corporation) and open-source stakeholders (e.g., Red Hat, Mozilla Foundation, Debian Project) over governance and priorities. Technical limitations include XML complexity versus JSON alternatives popularized by ECMAScript ecosystems and performance trade-offs in libraries such as ICU and glibc. Security and licensing discussions involve organizations like Open Source Initiative and standards groups including W3C and IETF about reuse, attribution, and compatibility.

Category:Unicode Consortium