Database management system

Database management system
Name	Database management system
Genre	System software
License	Proprietary, free, open-source

Contents

History
Architecture and components
Data models and query languages
Transaction management and concurrency control
Storage, indexing, and optimization
Security, integrity, and recovery
Types and deployment models

Database management system

A database management system organizes, stores, retrieves, and manages structured information for applications and institutions. It provides interfaces used by developers, administrators, and analysts to perform queries, transactions, and analytics across datasets in enterprise, scientific, and web contexts. Implementations appear in products and projects from companies and organizations such as IBM, Oracle Corporation, Microsoft, Amazon Web Services, and Mozilla Foundation.

History

Early foundations trace to work at IBM research labs and projects like the System R prototype and the Ingres project at the University of California, Berkeley. Pioneering theoretical advancements were influenced by results from researchers at Bell Labs and the University of California, Los Angeles who engaged with relational concepts introduced by E. F. Codd while affiliated with IBM. Commercial products emerged in the 1970s and 1980s from firms such as Oracle Corporation and Sybase, and later developments in open-source communities like PostgreSQL Global Development Group and MySQL AB changed deployment models. The rise of distributed systems intersected with projects at Google and Amazon (company), producing architectures adopted in large-scale services during the 2000s and 2010s.

Architecture and components

A system typically comprises a query processor, storage engine, transaction manager, and catalog service; comparable modular designs exist in products from Microsoft Corporation and SAP SE. The query processor builds execution plans using cost estimates derived from statistics maintained by teams at organizations like Apache Software Foundation in projects such as Apache Hadoop and Apache Cassandra. Storage engines interact with file systems developed by institutions including Intel Corporation and Sun Microsystems and exploit virtualized environments provided by VMware, Inc. and Kubernetes orchestrated clusters created by contributors in the Cloud Native Computing Foundation. Administration tools integrate with identity providers such as Okta, Inc. and Microsoft Azure Active Directory.

Data models and query languages

Relational models, championed by E. F. Codd and implemented in systems from IBM and Oracle Corporation, use languages standardized by organizations like International Organization for Standardization for SQL. Alternative models include document models implemented by vendors like MongoDB, Inc., key–value stores influenced by research at Google and deployed in products such as Amazon DynamoDB, graph models advanced by work from Neo4j, Inc. and academics affiliated with Stanford University, and columnar families popularized by projects like Apache Cassandra and HBase. Declarative and procedural extensions have origins in consortia including World Wide Web Consortium contributors and standards committees at ISO/IEC.

Transaction management and concurrency control

Atomic, consistent, isolated, durable properties underpin transaction semantics described by researchers connected to ACM conferences and laboratories at MIT and Carnegie Mellon University. Concurrency control algorithms such as two-phase locking trace to early database research at IBM and University of California, Berkeley; optimistic concurrency and timestamp ordering evolved in follow-on studies at Cornell University and Princeton University. Distributed transactions and consensus protocols leverage work from teams at Google (e.g., Paxos-inspired practices) and groups developing Raft implementations in projects overseen by developers associated with Stanford University.

Storage, indexing, and optimization

Physical storage strategies employ data structures like B-trees and log-structured merge trees, with theoretical roots in research disseminated through SIGMOD and implemented across products from Oracle Corporation, Microsoft, and Google. Indexing techniques—bitmap, hash, and spatial indexes—were refined in academic settings including University of California, Berkeley and applied in systems built by Esri and companies supporting geospatial workloads. Query optimizers use cost models developed in collaboration between universities such as University of Wisconsin–Madison and industrial labs at DEC and Hewlett-Packard; modern vectorized and columnar optimizations draw on work from Torricelli-era teams and contemporary projects at Facebook.

Security, integrity, and recovery

Access control, authentication, and encryption are implemented in products coordinated with standards from National Institute of Standards and Technology and auditors from firms like Deloitte. Integrity constraints and schema validation mechanisms were formalized through academic publications from University of California, Los Angeles and enforced in engines shipped by Oracle Corporation and PostgreSQL communities. Backup, point-in-time recovery, and replication strategies were developed in industrial research at EMC Corporation and employed in cloud offerings by Amazon Web Services and Google Cloud Platform.

Types and deployment models

Systems appear in categories such as relational database systems (RDBMS) exemplified by Oracle Database and Microsoft SQL Server, NoSQL systems like MongoDB and Cassandra (database), NewSQL projects pursued in startups and labs including efforts from VoltDB and research groups at Yale University, embedded databases developed by firms like SQLite Consortium, and data warehouse solutions produced by companies such as Snowflake (company). Deployment models range from on-premises installations managed by enterprises like General Electric to cloud-native managed services offered by Amazon Web Services, Microsoft Azure, and Google Cloud Platform; hybrid and edge deployments intersect with networking products from Cisco Systems and virtualization by Red Hat, Inc..

Category:Computer software