LLMpediaThe first transparent, open encyclopedia generated by LLMs

URI

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Tim Berners-Lee Hop 3
Expansion Funnel Raw 73 → Dedup 12 → NER 6 → Enqueued 6
1. Extracted73
2. After dedup12 (None)
3. After NER6 (None)
Rejected: 6 (not NE: 6)
4. Enqueued6 (None)
URI
NameURI
Introduced1994
DeveloperTim Berners-Lee; World Wide Web Consortium; Internet Engineering Task Force
TypeIdentifier

URI A URI is a compact string of characters used to identify or name a resource on the Internet and in computing systems. It functions as a uniform mechanism to refer to entities such as documents, services, endpoints, and namespaces, enabling interoperability among systems like web browsers, mail clients, and distributed services. Specifications and standards created by groups such as the Internet Engineering Task Force and the World Wide Web Consortium define its syntax, semantics, and processing rules.

Definition and Overview

A URI is defined in formal standards produced by the Internet Engineering Task Force and referenced by the World Wide Web Consortium; it provides a textual identifier that may be used to locate, name, or act upon resources such as Hypertext Transfer Protocol, Simple Mail Transfer Protocol, File Transfer Protocol, Lightweight Directory Access Protocol, and namespaces used by Extensible Markup Language and XML Schema. Early conceptual work by Tim Berners-Lee and collaborators at the CERN and later standardization efforts at the IETF framed URIs as a unifying concept spanning earlier schemes like Uniform Resource Locator and Uniform Resource Name while supporting new schemes such as mailto: and data:. URIs underpin technologies and services including World Wide Web Consortium standards, Representational State Transfer architectures, and distributed resource discovery mechanisms used by Simple Object Access Protocol and OAuth flows.

Syntax and Components

The generic syntax described in official documents decomposes a URI into hierarchical components: scheme, authority, path, query, and fragment. Common schemes include http, https, ftp, mailto, and file; the authority component may include user information, host identifiers such as Domain Name System names or numeric addresses like Internet Protocol literals, and port numbers used by services such as Hypertext Transfer Protocol on port 80 or HTTPS on port 443. The path component often encodes hierarchical names used by servers including Apache HTTP Server, Nginx, and Microsoft Internet Information Services; the query and fragment components interact with client-side processing in environments like WebAssembly runtimes and single-page applications conforming to HTML5 and ECMAScript rules. Percent-encoding and character normalization rules reference character sets such as Unicode and encodings like UTF-8 to represent internationalized identifiers and to avoid ambiguities when interworking with protocols like SMTP and IMAP.

Comparison with URL and URN

The relationship among the umbrella concept and specific forms was clarified through community and standards discussions involving Internet Engineering Task Force working groups and committees of the World Wide Web Consortium. A URL traditionally emphasizes access methods and network locations used by protocols like HTTP and FTP, while a URN focuses on persistent names and namespaces managed by registration authorities such as Internet Assigned Numbers Authority and standards registries like the IANA registries. Scholarly and technical debates referencing contributions from Roy Fielding and others in the IETF community addressed practical overlaps and distinctions in contexts including RESTful architecture, Digital Object Identifier systems, and persistent identifier initiatives used by libraries like the Library of Congress and agencies such as CrossRef.

Standardization and Specifications

Core specifications were developed and published through Internet Engineering Task Force documents and updates maintained by working groups that produced Request for Comments covering syntax, normalization, and resolution. Key documents and editorial revisions involved contributors from institutions such as World Wide Web Consortium, IETF Working Group participants, and academics from universities and research centers. The registration and governance of scheme names and associated metadata are managed in registries curated by Internet Assigned Numbers Authority and coordinated with international organizations including International Organization for Standardization in some naming contexts. Additional guidance for internationalized identifiers and encoding interacts with standards like Unicode Standard and registries maintained by IANA.

Implementation and Examples

URIs are implemented across software ecosystems: web servers like Apache HTTP Server and Nginx present resources identified via URI paths; client libraries in languages such as JavaScript, Python, Java, C#, and Go (programming language) provide parsing, normalization, and resolution utilities; frameworks and platforms including Node.js, Django, Spring Framework, ASP.NET, and Ruby on Rails expose APIs that accept and produce URIs. Examples of schemes and usages include referencing electronic mailboxes with Simple Mail Transfer Protocol mailto: links, embedding data with data: URIs in HTML5 documents, using ftp: URIs for legacy transfer with File Transfer Protocol servers, and employing custom scheme handlers in mobile platforms like Android and iOS to enable inter-app linking and deep linking mechanisms.

Security and Privacy Considerations

Security issues associated with URIs arise in contexts involving redirection, cross-origin interactions, and injection attacks against parsers implemented in libraries and applications such as OpenSSL-using servers, NGINX configurations, and client-side runtimes like WebKit and Blink. Threats include deceptive hostnames leveraging Internationalized Domain Names for homograph attacks, sensitive data leakage via query strings exposed in Server Logs or referrer headers sent to third parties like Google and Facebook, and ambiguous normalization leading to access control bypass in services running on platforms including Linux and Windows Server. Mitigations reference practices and tools from communities around OWASP guidance, secure coding recommendations from CERT Coordination Center, and platform-specific protections such as same-origin policies in HTML5 and content security policies implemented by browsers like Mozilla Firefox and Google Chrome.

Category:Internet standards