Generated by GPT-5-mini| Unicode Technical Standard #35 | |
|---|---|
| Name | Unicode Technical Standard #35 |
| Abbreviation | UTS #35 |
| Developer | Unicode Consortium |
| First published | 2010 |
| Latest version | Unicode 15.1 (example) |
| Status | Active |
Unicode Technical Standard #35 Unicode Technical Standard #35 is a specification published by the Unicode Consortium that defines locale identifier syntax, language tag subtags, and related matching algorithms used in computing and internationalization. The standard interacts with specifications from the Internet Engineering Task Force, the World Wide Web Consortium, and national bodies such as ISO 639, ISO 15924, and IETF BCP 47 while influencing implementations in projects like GNU Project, Mozilla Foundation, Microsoft Corporation, and Apple Inc..
UTS #35 provides a formal definition for locale identifiers, language tags, and locale matching rules intended to bridge work from IETF BCP 47, ISO 639-1, ISO 639-2, ISO 15924 and script, region, and variant subtags used by platforms such as Linux Foundation, Android (operating system), iOS, macOS, Windows NT, and libraries like glibc and ICU. The document establishes canonicalization, extension mechanisms, and mappings that affect projects including Apache Software Foundation, Eclipse Foundation, LibreOffice, and Google LLC as well as standards bodies like W3C Internationalization Working Group and IETF Internationalization.
The scope covers identifier syntax, subtag registries, matching and fallback algorithms, and locale-related data exchange affecting implementations in OpenJDK, Node.js, PHP, Perl, Python (programming language), and Ruby (programming language). The purpose is to enable consistent language negotiation across platforms such as WebRTC, HTML5, CSS, and application ecosystems maintained by Facebook, Inc., Twitter, Inc., and Netflix, Inc. while referencing standards from ISO/IEC JTC 1/SC 2 and coordination with the Unicode Locale Data Markup Language efforts.
Key components include the grammar for locale identifiers linked to IETF BCP 47 subtags, definitions for language, script, region, variant, and extension subtags used by implementations such as ICU (software), CLDR, SIL International tools, and operating systems like Ubuntu. It specifies canonicalization and mapping behaviors comparable to processes in RFC 5646 and interacts with registries curated by organizations including IANA, ISO, and the Unicode Consortium itself. The standard also defines algorithms for likely subtags and locale matching which are applied in toolchains like GCC, LLVM, Qt Project, and GTK.
Adopters implement UTS #35 in internationalization libraries and frameworks such as ICU (software), CLDR, POSIX, Java Platform, .NET Framework, and browser engines like Blink (browser engine), Gecko (engine), and WebKit. Usage scenarios include language negotiation for HTTP, content localization in XML, JSON, and XLIFF workflows used by SAP SE, Oracle Corporation, and localization platforms like Transifex and Crowdin. Implementers typically map legacy identifiers from projects such as Babel (Python), gettext, and Microsoft Globalization APIs to canonical forms mandated by the standard.
The initial publication followed coordination with IETF and updates have been issued in concert with major Unicode and CLDR releases, reflecting contributions from stakeholders including Google LLC, Microsoft Corporation, Apple Inc., IBM, and regional standards bodies like JISC and AFNOR. Revisions adjust mappings, add subtags, and refine matching behavior to accommodate inputs from projects such as OpenStack, Kubernetes, and cloud providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure that rely on consistent locale handling.
UTS #35 aims to ensure interoperability with IETF BCP 47, RFC 5646, RFC 4646, ISO 639, ISO 15924, and registries maintained at IANA so that systems including Apache HTTP Server, Nginx, NGINX Unit, Tomcat, Jetty, and content management systems like WordPress, Drupal, and Joomla can negotiate locales coherently. Compatibility considerations also affect virtualization and container ecosystems such as Docker (software), LXC, and orchestration tools like Kubernetes where locale propagation and canonicalization determine behavior for services and user agents.