Uniform Resource Identifier

Contents

Definition and purpose
Syntax and components
URI vs. URL vs. URN
History and standardization
Use in protocols and applications

Uniform Resource Identifier. A Uniform Resource Identifier is a compact sequence of characters that unambiguously identifies an abstract or physical resource. It is the fundamental standard for naming and addressing resources on networks like the Internet and the World Wide Web. The generic syntax, defined by the Internet Engineering Task Force, provides a common framework for various schemes used in numerous APIs and protocols.

Definition and purpose

The primary purpose is to provide a uniform and extensible means for identifying resources, enabling consistent interaction across different networks and systems. Formally defined in RFC 3986, it serves as the umbrella term for both locators and names, a distinction formalized in earlier documents like RFC 2396 and RFC 1630. Its design allows for the integration of existing identification systems, such as the ISBN system for books, into a unified framework for global information systems.

Syntax and components

The generic syntax follows the format: `scheme:[//authority][/path][?query][#fragment]`. The `scheme` component, such as `http`, `ftp`, or `mailto`, defines the namespace and the semantics for the remainder. The `authority`, often containing user information, a host, and a port, identifies the naming authority governing the namespace. The `path` contains data, often organized hierarchically, that identifies a resource within the scope of the scheme and authority. The optional `query` component contains non-hierarchical data, typically provided by web browsers from forms, while the `fragment` allows for indirect identification of a secondary resource.

URI vs. URL vs. URN

This standard encompasses two primary subsets: the Uniform Resource Locator and the Uniform Resource Name. A URL, like `https://www.example.com/page`, specifies both the identity and the primary access mechanism or network "location" of a resource. In contrast, a URN, intended to be globally unique and persistent, acts as a resource's name in a specific namespace, exemplified by `urn:isbn:0451450523` which identifies a book via the ISBN system. The formal distinction was outlined in RFC 3305, though in common practice, particularly within the W3C and IETF communities, the term URL is often used synonymously with certain types of web addresses.

History and standardization

The concept originated with Tim Berners-Lee and the early development team at CERN for the World Wide Web project, aiming to create a universal system for document identification. The first formal specification was published as RFC 1630 in 1994. This was superseded by the more definitive RFC 2396 in 1998, co-authored by Tim Berners-Lee, Roy Fielding, and Larry Masinter. The current, consolidated standard is RFC 3986, published in 2005 and authored by Roy Fielding and Tim Berners-Lee. The ISO and the IEC have also standardized it as ISO/IEC 9834-8.

Use in protocols and applications

It is the cornerstone of modern networked applications and protocols. The HTTP and HTTPS protocols use them to specify resources on web servers, which are interpreted by clients like Chrome and Firefox. The XML specification uses them for namespace identification, a practice also central to technologies like RDF and the Semantic Web. Beyond the web, schemes like `mailto:` are used in SMTP clients, `file:` for local system resources, and `tel:` for telephony integration in applications like Skype. They are also integral to cloud APIs, such as those from AWS and Azure, for resource management.