Generated by GPT-5-mini| RFC 3987 | |
|---|---|
| Title | RFC 3987 |
| Status | Proposed Standard |
| Author | John C. Klensin, Martin Dürst |
| Published | 2005 |
| Pages | 46 |
| Category | Internet Standards |
RFC 3987 is a standards-track document that defines the syntax and semantics of Internationalized Resource Identifiers. It updates prior specifications by extending the Uniform Resource Identifier framework to support a wide range of Unicode characters and scripts. The document provides an Augmented Backus–Naur Form grammar, normative considerations for encoding and comparison, and guidance for implementers across diverse platforms and protocols such as HTTP, SMTP, DNS, and LDAP.
RFC 3987 formalizes Internationalized Resource Identifiers to complement earlier work on URIs and to interact with standards like RFC 2396 and RFC 3986. It situates itself among efforts by organizations such as the Internet Engineering Task Force and the IETF Working Groups concerned with internationalization, aligning with character repertoire work in Unicode Consortium publications and the ISO/IEC standards family. The specification addresses interoperability concerns raised by implementers working with systems including Apache HTTP Server, Microsoft Windows, Apple macOS, Linux kernel components, and networking stacks used in enterprise products from Cisco Systems and Juniper Networks.
The document responds to increasing demand from multilingual communities represented in regions such as Asia, Africa, Europe, South America, and Middle East to use native scripts in identifiers. It builds on translational and normalization work found in Unicode Technical Report 36 and harmonizes with internationalization efforts by World Wide Web Consortium, ICANN, and projects like Internationalized Domain Names. Use cases include multilingual webpages from institutions such as UNESCO, bibliographic records from Library of Congress, and resource discovery in repositories like arXiv and PubMed. RFC 3987 aims to balance usability for applications created by vendors such as Mozilla Foundation, Google, and Opera Software with protocol integrity maintained by standards bodies like IETF and IAB.
The core of RFC 3987 is an ABNF grammar that defines the allowed characters and structural components for IRIs, aligning with grammars used in RFC 5234. The ABNF references Unicode character properties defined by the Unicode Standard and categorizes characters in ways comparable to classifications used in ISO/IEC 10646. Syntax elements such as scheme, authority, path, query, and fragment are mapped to constructs familiar from HTTP/1.1 and URI syntax used in RESTful APIs implemented by frameworks like Django, Ruby on Rails, and Node.js. The spec prescribes how IRIs can be converted to ASCII-based representations for legacy protocols, bridging to mechanisms like Punycode and IDNA used in DNS.
RFC 3987 explicitly references the Unicode repertoire and normalization forms such as Unicode Normalization Form C and Normalization Form D to address equivalence and comparison. It describes permissible characters drawn from scripts such as Latin script, Cyrillic script, Arabic script, Han characters, Devanagari, and Hangul, and it discusses ramifications for fonts and rendering engines used in products from Adobe Systems and Microsoft Corporation. The document interacts with encoding technologies like UTF-8 and legacy encodings considered in standards by IANA and ISO. It also notes issues previously surfaced in internationalization documents from W3C Internationalization Tag Set and coordination with registries maintained by IANA.
Implementers in server and client software stacks—examples include Nginx, Lighttpd, Postfix, Exim, web browsers such as Firefox, Chrome, and Safari—must map IRIs to URIs when interfacing with legacy protocols. RFC 3987 influences libraries and toolchains in ecosystems like Java Platform, .NET Framework, Perl, Python, and Go (programming language), guiding behavior in functions that parse, normalize, and resolve identifiers. Deployments in content management systems like WordPress and Drupal and services like GitHub benefit from IRI interoperability guidelines, while registries and certificate authorities such as Let's Encrypt and Verisign consider these rules when validating names. The document also informs internationalized resource workflows in search engines run by organizations such as Baidu, Yandex, and Microsoft Bing.
RFC 3987 highlights security risks tied to visually confusable characters and mixing scripts, echoing concerns addressed in advisories from CERT Coordination Center and studies published by researchers affiliated with MIT, Stanford University, and ETH Zurich. It warns about spoofing attacks similar to those examined in phishing incidents investigated by Federal Trade Commission and Europol, and it recommends normalization, canonicalization, and user-interface mitigations employed by browser vendors including Mozilla Foundation and Google. Privacy implications arise where IRIs intersect with tracking and analytics provided by firms like Adobe Systems and Google LLC, and the spec advises implementers to consider protocol-level protections standardized by bodies like IETF and IETF Security Area.
Category:Internet standards