Presto Software Foundation

Presto Software Foundation
Name	Presto Software Foundation
Type	Non-profit
Founded	2019
Location	Global
Focus	Big data, distributed query engines, open source

Contents

History
Governance and Organization
Projects and Technologies
Community and Ecosystem
Adopted Standards and Compliance
Funding and Membership
Notable Deployments and Use Cases

Presto Software Foundation The Presto Software Foundation is a nonprofit organization formed to steward the development of the Presto distributed SQL query engine and related projects. It provides governance, infrastructure, and branding for projects that enable high-performance analytics across heterogeneous Hadoop-based and cloud-native Amazon Web Services environments, engaging contributors from companies such as Facebook (company), Uber Technologies, Netflix, Inc., and Starburst Data. The foundation coordinates releases, certification, and community events to foster interoperability with technologies like Apache Hadoop, Apache Hive, Apache Spark, and Trino (software).

History

The foundation was established in the late 2010s amid growing interest in interactive analytics beyond MapReduce paradigms, following earlier development efforts at Facebook (company) where Presto originated to address latency limitations of Apache Hive. Initial stewardship involved multiple corporate contributors, including teams from LinkedIn, Twitter, and Pinterest (company), and it later formalized governance to support neutral oversight similar to organizations like the Apache Software Foundation and Linux Foundation. The project's evolution included forks and parallel ecosystems, notably the emergence of Trino (software) from a principal Presto contributor group, and subsequent collaboration and competition among industry stakeholders such as Starburst Data and Amazon Web Services. Over time the foundation expanded to include tooling for connectors, security integrations with Apache Ranger, and performance work compatible with Kubernetes and OpenStack deployments.

Governance and Organization

Governance follows a meritocratic model with a board drawn from corporate members and individual contributors, analogous to structures used by Cloud Native Computing Foundation and OpenStack Foundation. The board oversees technical steering committees (TSCs) and working groups for areas like connector development, security, and performance. Contributor agreements and intellectual property policies reflect precedents set by Eclipse Foundation and Linux Foundation models, ensuring contributor license management for organizations such as Facebook (company), Uber Technologies, Netflix, Inc., and independent maintainers. Regular meetings and annual general assemblies take place alongside conferences like Strata Data Conference and KubeCon + CloudNativeCon.

Projects and Technologies

The foundation hosts the core Presto engine and an ecosystem of connectors and extensions interoperable with systems including Apache Hive, Apache HBase, MySQL, PostgreSQL, Amazon S3, and Google Cloud Platform. Performance engineering targets integration with columnar formats like Apache Parquet and Apache ORC, and query optimization features borrow concepts from Volcano (query optimizer)-style planners and cost-based optimizers used in Apache Calcite. Security and governance integrations include compatibility with Apache Ranger, Kerberos, and OAuth 2.0 deployments in enterprise settings such as Microsoft Azure and IBM Cloud. The foundation also incubates projects for SQL extensions, materialized view tooling, and resource management for orchestration with Kubernetes and YARN.

Community and Ecosystem

The contributor community comprises engineers from major technology firms, independent developers, and academic researchers from institutions like Carnegie Mellon University and University of California, Berkeley. Community activities include mailing lists, GitHub repositories, code sprints, and meetups at conferences such as Strata Data Conference, DataEngConf, and Open Source Summit. Commercial vendors including Starburst Data, Ahana (company), and AWS offer managed services and commercial support, while cloud providers like Google Cloud Platform and Microsoft Azure provide integration points. Collaboration occurs with adjacent projects and foundations such as Apache Software Foundation, Cloud Native Computing Foundation, and DataStax in cross-project connector and operator development.

Adopted Standards and Compliance

The foundation emphasizes adherence to SQL:2011 and later SQL standards where applicable, and aligns data format compatibility with ISO/IEC-recognized specifications used by Apache Parquet and Apache ORC. For security and privacy, projects implement industry standards like OAuth 2.0, OpenID Connect, and Kerberos for authentication, and support auditing practices comparable to regulatory frameworks observed in sectors regulated by entities such as the U.S. Securities and Exchange Commission and European Data Protection Board. Interoperability testing and certification programs mirror approaches used by ODBC and JDBC ecosystem stakeholders.

Funding and Membership

Funding sources include corporate sponsorships, membership dues from enterprises such as Facebook (company), Uber Technologies, and Netflix, Inc., and donations from cloud providers and service vendors like Amazon Web Services and Google LLC. Membership tiers provide voting rights, board nominations, and technical steering participation, with companies often contributing developer time and infrastructure for CI/CD pipelines hosted in collaboration with platforms like GitHub and GitLab. Grants and sponsorships support community events, outreach programs, and documentation initiatives similar to funding models used by the Linux Foundation.

Notable Deployments and Use Cases

Presto-based deployments power interactive analytics and ad-hoc querying at scale across enterprises in financial services firms such as Goldman Sachs, advertising platforms at LinkedIn, streaming analytics at Netflix, Inc., and operational analytics at ride-hailing companies including Uber Technologies. Use cases include log analytics with Apache Kafka ingestion pipelines, customer analytics against data lakes on Amazon S3 and Google Cloud Storage, and federated querying across heterogeneous stores including MySQL and PostgreSQL backends. Deployments integrate with orchestration tools like Kubernetes and monitoring stacks based on Prometheus and Grafana for observability and SLA-driven operations.

Category:Free software foundations Category:Data management systems