LLMpediaThe first transparent, open encyclopedia generated by LLMs

Unicode Technical Standard #35

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 88 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted88
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Unicode Technical Standard #35
NameUnicode Technical Standard #35
AbbreviationUTS #35
DeveloperUnicode Consortium
First published2010
Latest versionUnicode 15.1 (example)
StatusActive

Unicode Technical Standard #35 Unicode Technical Standard #35 is a specification published by the Unicode Consortium that defines locale identifier syntax, language tag subtags, and related matching algorithms used in computing and internationalization. The standard interacts with specifications from the Internet Engineering Task Force, the World Wide Web Consortium, and national bodies such as ISO 639, ISO 15924, and IETF BCP 47 while influencing implementations in projects like GNU Project, Mozilla Foundation, Microsoft Corporation, and Apple Inc..

Overview

UTS #35 provides a formal definition for locale identifiers, language tags, and locale matching rules intended to bridge work from IETF BCP 47, ISO 639-1, ISO 639-2, ISO 15924 and script, region, and variant subtags used by platforms such as Linux Foundation, Android (operating system), iOS, macOS, Windows NT, and libraries like glibc and ICU. The document establishes canonicalization, extension mechanisms, and mappings that affect projects including Apache Software Foundation, Eclipse Foundation, LibreOffice, and Google LLC as well as standards bodies like W3C Internationalization Working Group and IETF Internationalization.

Scope and Purpose

The scope covers identifier syntax, subtag registries, matching and fallback algorithms, and locale-related data exchange affecting implementations in OpenJDK, Node.js, PHP, Perl, Python (programming language), and Ruby (programming language). The purpose is to enable consistent language negotiation across platforms such as WebRTC, HTML5, CSS, and application ecosystems maintained by Facebook, Inc., Twitter, Inc., and Netflix, Inc. while referencing standards from ISO/IEC JTC 1/SC 2 and coordination with the Unicode Locale Data Markup Language efforts.

Key Components

Key components include the grammar for locale identifiers linked to IETF BCP 47 subtags, definitions for language, script, region, variant, and extension subtags used by implementations such as ICU (software), CLDR, SIL International tools, and operating systems like Ubuntu. It specifies canonicalization and mapping behaviors comparable to processes in RFC 5646 and interacts with registries curated by organizations including IANA, ISO, and the Unicode Consortium itself. The standard also defines algorithms for likely subtags and locale matching which are applied in toolchains like GCC, LLVM, Qt Project, and GTK.

Implementation and Usage

Adopters implement UTS #35 in internationalization libraries and frameworks such as ICU (software), CLDR, POSIX, Java Platform, .NET Framework, and browser engines like Blink (browser engine), Gecko (engine), and WebKit. Usage scenarios include language negotiation for HTTP, content localization in XML, JSON, and XLIFF workflows used by SAP SE, Oracle Corporation, and localization platforms like Transifex and Crowdin. Implementers typically map legacy identifiers from projects such as Babel (Python), gettext, and Microsoft Globalization APIs to canonical forms mandated by the standard.

Version History and Revisions

The initial publication followed coordination with IETF and updates have been issued in concert with major Unicode and CLDR releases, reflecting contributions from stakeholders including Google LLC, Microsoft Corporation, Apple Inc., IBM, and regional standards bodies like JISC and AFNOR. Revisions adjust mappings, add subtags, and refine matching behavior to accommodate inputs from projects such as OpenStack, Kubernetes, and cloud providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure that rely on consistent locale handling.

Compatibility and Interoperability

UTS #35 aims to ensure interoperability with IETF BCP 47, RFC 5646, RFC 4646, ISO 639, ISO 15924, and registries maintained at IANA so that systems including Apache HTTP Server, Nginx, NGINX Unit, Tomcat, Jetty, and content management systems like WordPress, Drupal, and Joomla can negotiate locales coherently. Compatibility considerations also affect virtualization and container ecosystems such as Docker (software), LXC, and orchestration tools like Kubernetes where locale propagation and canonicalization determine behavior for services and user agents.

Category:Unicode