Universally Unique Identifier

Universally Unique Identifier
Name	Universally Unique Identifier
Abbreviation	UUID
Standard	IETF RFC 9562, ISO/IEC 9834-8:2023
First defined	1990s (OSF DCE)
Example	123e4567-e89b-12d3-a456-426614174000

Contents

Definition and purpose
Format and generation
Variants and versions
Collision probability
Common uses
Standards and specifications

Universally Unique Identifier. A Universally Unique Identifier is a 128-bit label used for information identification in computer systems, designed to be unique across both space and time without requiring a central registration authority. The concept was formalized by the Open Software Foundation as part of the Distributed Computing Environment in the early 1990s and has since become a foundational standard in software engineering. Its primary purpose is to enable the decentralized generation of identifiers that are statistically guaranteed to avoid duplication, facilitating reliable data exchange and system interoperability across diverse platforms like Microsoft Windows, Apple's macOS, and countless database management systems.

Definition and purpose

The core definition revolves around a standardized format for generating identifiers that are unique across distributed systems without requiring synchronous coordination. This solves critical problems in parallel computing and distributed systems, where entities like software components, database records, or hardware devices need unambiguous labels. The purpose is to provide a reliable mechanism for naming resources in environments such as web services, file systems like ext4, and object-oriented programming frameworks, ensuring that identifiers generated in New York City do not conflict with those created in Tokyo. This decentralized approach underpins the architecture of modern systems including the World Wide Web and cloud computing platforms like Amazon Web Services.

Format and generation

The canonical textual representation is a 32-digit hexadecimal number, displayed in five groups separated by hyphens in the pattern 8-4-4-4-12, for example, `550e8400-e29b-41d4-a716-446655440000`. Generation algorithms utilize unique inputs such as a timestamp from the system clock, a random number generator, and often a MAC address or other node identifier. Specific methods are defined in standards published by the Internet Engineering Task Force, with common implementations found in programming languages like Python's `uuid` module and Java's `java.util.UUID` class. The structure includes bits to encode the version number and variant, ensuring different generation methods can be identified and parsed correctly by systems from IBM to Google.

Variants and versions

Several versions exist, each defined by its generation algorithm. Version 1 combines a MAC address and a timestamp, while Version 2 used in the Distributed Computing Environment incorporates a POSIX user or group identifier. Version 3 and Version 5 create identifiers by applying MD5 or SHA-1 cryptographic hash functions to a namespace and a name, as used in systems like Active Directory. Version 4 relies entirely on randomness from a pseudorandom number generator, making it extremely common in web applications and frameworks like Django and Ruby on Rails. The variant field, governed by standards from the International Organization for Standardization, determines the layout of the remaining bits and ensures backward compatibility with legacy systems from companies like Hewlett-Packard.

Collision probability

The probability of generating two identical identifiers, known as a collision, is vanishingly small due to the immense 128-bit address space, which contains over 3.4×10³⁸ possible values. For context, this number vastly exceeds the number of stars in the Milky Way galaxy or grains of sand on Earth. Mathematical analyses, often referencing the birthday problem, show that even when generating billions of identifiers per second for millennia, the chance of collision remains negligible. This statistical guarantee is a cornerstone for trust in systems handling critical data, from financial transactions on the SWIFT network to patient records in National Health Service databases, without needing a central arbiter like the Internet Assigned Numbers Authority.

Common uses

These identifiers are ubiquitous in modern computing, serving as primary keys in database tables within systems like Oracle Database and Microsoft SQL Server. They uniquely label virtual machines in VMware and Microsoft Azure, document objects in MongoDB, and individual messages in Advanced Message Queuing Protocol brokers. The Extensible Markup Language schema definitions often specify their use for element IDs, and they are integral to the Component Object Model in Microsoft Windows. In web development, frameworks such as React and Angular use them to track dynamic elements, while the OAuth protocol employs them for secure token generation.

Standards and specifications

The formal specification is maintained by the Internet Engineering Task Force, with the current standard defined in RFC 9562, which obsoletes earlier RFCs including RFC 4122. It is also published as an international standard, ISO/IEC 9834-8:2023. The original specification emerged from the Open Software Foundation's Distributed Computing Environment, with subsequent adoption and refinement by major consortia including the World Wide Web Consortium for web standards. These documents provide the authoritative definition for implementers across industries, ensuring consistent interpretation from Silicon Valley startups to established technology firms like SAP SE.

Category:Identifiers Category:Computer science