ClickHouse (software)

ClickHouse (software)
Name	ClickHouse
Developer	Yandex
Released	2009
Programming language	C++
Operating system	Linux
Genre	Column-oriented DBMS
License	Apache License 2.0

Contents

History
Architecture
Features
Use cases and performance
Ecosystem and integrations
Adoption and notable deployments

ClickHouse (software) is an open-source, column-oriented database management system optimized for online analytical processing and real-time analytics, originally developed at Yandex. It emphasizes high-throughput, low-latency query processing for large-scale data warehouses and event streams, competing with systems from Amazon Web Services, Google, and Microsoft. Its design draws on ideas used in projects at Facebook, Twitter, and LinkedIn, and it is used by enterprises in sectors served by Alphabet Inc., Tencent, Oracle Corporation, and SAP SE.

History

Development began at Yandex in 2009 to address analytics needs similar to those that led to systems like Druid (software), HP Vertica, and Amazon Redshift at companies such as Netflix, Uber Technologies, and Airbnb. Public release and open-source relicensing under the Apache License 2.0 followed, attracting contributions from organizations including Altinity, Cloudflare, and Mail.ru Group. Governance and community activities have involved conferences and meetups similar to Strata Data Conference, KubeCon, and FOSDEM, and feature development has paralleled research from Stanford University, MIT, and University of California, Berkeley groups exploring column-store architectures. Significant milestones include improvements in vectorized execution reflecting research from Intel Corporation and NVIDIA, and introduction of replication and clustering influenced by patterns used by CERN and NASA for scientific data. The project’s ecosystem grew with integrations inspired by technologies from Apache Kafka, Apache Spark, and Hadoop, and commercial support emerged from vendors modeled on Cloudera and Confluent.

Architecture

ClickHouse uses a columnar storage layout akin to Apache Parquet and ORC (file format) that enables high compression ratios and vectorized processing used by Intel and AMD CPU microarchitectures. Its storage engine implements merge tree families comparable to data structures in Google Bigtable and replication strategies inspired by Paxos and Raft (computer science), with influences from projects at Dropbox and Facebook. The server architecture supports distributed queries across shards and replicas resembling federated systems deployed by Twitter and Slack Technologies. Query parsing and execution incorporate techniques from research at Carnegie Mellon University and Princeton University, and the networking stack leverages patterns seen in nginx and Envoy (software). Integration points for observability follow models from Prometheus, Grafana, and ELK Stack, while security features reflect practices at Microsoft Azure and Amazon Web Services.

Features

Key features include real-time inserts and fast analytical SELECT queries similar to capabilities in Presto (software), Trino, and Greenplum Database. ClickHouse supports SQL dialect elements used by PostgreSQL and MySQL clients, column compression techniques akin to Zstandard and LZ4 research from Facebook, and materialized views comparable to implementations at Oracle Corporation and IBM Db2. It offers TTL (time-to-live) policies inspired by retention mechanisms at Pinterest and Snap Inc., secondary indexes and data skipping indexes analogous to those in Dremio and Impala, and user-defined functions patterns seen at Snowflake (company). High-availability features mirror replication patterns used by Etcd and Consul, and backup/restore strategies follow practices used by Veeam and Commvault.

Use cases and performance

Use cases include clickstream analytics like implementations at The New York Times and The Guardian, monitoring and observability workloads seen at Datadog and New Relic, fraud detection patterns employed by PayPal and Stripe, adtech processing similar to The Trade Desk and Criteo, and telemetry for platforms like SpaceX and Tesla, Inc.. Benchmarks have compared ClickHouse against Amazon Redshift, Google BigQuery, Apache Impala, and Snowflake, often showing advantages in single-node throughput and latency for read-heavy analytics similar to results reported by Netflix and Spotify. Real-world deployments demonstrate petabyte-scale storage capabilities at organizations following data architectures used by Bloomberg and Thomson Reuters, and sub-second aggregation performance for dashboards akin to solutions from Tableau Software and Looker.

Ecosystem and integrations

The ecosystem includes connectors and integrations with Apache Kafka, RabbitMQ, Debezium, Apache Spark, and Flink (software), enabling streaming ingestion patterns used by Confluent and DataStax. ETL tools and orchestration systems such as Airflow, Prefect, and Dagster are commonly paired with ClickHouse, as are data catalog and governance platforms modeled after Collibra and Alation. BI integrations exist with Tableau, Power BI, Superset, and Metabase, while cloud deployments emulate practices from Google Cloud Platform, Microsoft Azure, and Amazon Web Services marketplaces. Monitoring and logging integrations include Prometheus, Grafana, and Fluentd, and security/infrastructure automation often uses Terraform, Ansible, and Kubernetes.

Adoption and notable deployments

Notable adopters include technology companies and media organizations following large-scale data patterns at Cloudflare, Yandex, VKontakte, Mail.ru Group, Uber Technologies, Spotify, and Zoom Video Communications. Telecommunications and finance firms have deployed ClickHouse inspired by architectures at Verizon, AT&T, Goldman Sachs, and Morgan Stanley. Open-source projects and vendors provide managed offerings similar to services from Amazon, Google, and Microsoft, and commercial support is offered by companies patterned after Altinity and Two Sigma. Academic and research institutions such as CERN, Imperial College London, and ETH Zurich have experimented with ClickHouse for experimental data processing and analytics.

Category:Column-oriented DBMS