Uniform Resource Locator

Uniform Resource Locator
Name	Uniform Resource Locator
Other names	Web address
Invented by	Tim Berners-Lee
Invented at	CERN
Year invented	1994
Related concepts	URI, URN, HTTP, HTML

Contents

Syntax
History
Standardization
URL resolution
Security considerations

Uniform Resource Locator. A Uniform Resource Locator is a specific type of Uniform Resource Identifier that provides the means to locate and retrieve a resource on a computer network, most notably the World Wide Web. It is the fundamental addressing mechanism used by web browsers and other applications to access information on the Internet. The structure combines a scheme name, hierarchical location information, and often a query component to specify a unique network resource.

Syntax

The syntax is formally defined in RFC 3986, published by the Internet Engineering Task Force, and follows a general hierarchical sequence. It begins with a scheme, such as `http` or `ftp`, followed by a colon and two slashes, then an authority component typically containing a hostname like `www.example.com`. The authority may include optional userinfo and a port number, such as in connections to a MySQL database. Following this is a path component, which represents a hierarchical structure on the server, and may be followed by a query string introduced by a question mark, used by server-side languages like PHP or ASP.NET. The final optional component is a fragment, introduced by a hash sign, which directs the client to a secondary resource, such as a specific section within an HTML document. This precise structure allows software like Google Chrome and Apache HTTP Server to interpret and process the address correctly.

History

The concept was created in 1994 by Tim Berners-Lee and the URI working group of the Internet Engineering Task Force, building upon his earlier work inventing the World Wide Web at CERN. The initial specification was published as RFC 1738, which standardized the format for several schemes including HTTP, FTP, and Gopher. Its development was integral to the explosive growth of the World Wide Web in the 1990s, enabling the Mosaic and later Netscape Navigator browsers to navigate between documents seamlessly. The syntax and semantics have evolved through subsequent standards, most notably RFC 2396 and the current RFC 3986, which unified it with the broader URI specification.

Standardization

Standardization is managed by the Internet Engineering Task Force and the World Wide Web Consortium, with the definitive specification being RFC 3986. This document, authored by Tim Berners-Lee, Roy Fielding, and Larry Masinter, defines the generic syntax for all URIs, subsuming the earlier separate definitions for URLs. Key related standards include RFC 2616 for HTTP and RFC 7578 for multipart data, which dictate how specific schemes operate. The International Organization for Standardization and the International Electrotechnical Commission have also published aligned standards, such as ISO/IEC 10646, which informs character encoding within them. Ongoing work by bodies like the WHATWG on specifications for the HTML living standard also influences their handling in web browsers.

URL resolution

Resolution is the process by which a client, such as Mozilla Firefox or cURL, converts a relative reference into an absolute address. This process is defined within RFC 3986 and involves parsing the components against a base URL, often the address of the current document as defined in the HTML specification. The algorithm is implemented in libraries like libcurl and within the networking stacks of operating systems such as Microsoft Windows and macOS. Resolution is crucial for the functioning of hyperlinks on sites like Wikipedia and for the proper loading of assets by content delivery networks like Cloudflare. The final resolved address is then used to initiate a network request, typically via protocols like HTTP or HTTPS.

Security considerations

Security issues are a major concern, as they can be used in phishing attacks to deceive users of Microsoft Outlook or Slack. Malicious actors may use Internationalized Domain Name homograph attacks to spoof legitimate sites like PayPal or Bank of America. The presence of sensitive data within query strings, such as session tokens, can lead to exposure through referer header leaks or server log files. Secure practices mandate the use of HTTPS, as specified in RFC 2818, to encrypt communication and validate server identity via certificates from authorities like DigiCert. Technologies such as Same-origin policy in Google Chrome and Content Security Policy help mitigate risks associated with malicious payloads. Organizations like OWASP regularly publish guidelines on safe handling within applications built with Django or Ruby on Rails.

Category:World Wide Web Category:Internet standards Category:Web development