DataEngConf — LLMpedia

DataEngConf
Name	DataEngConf
Status	Active
Genre	Conference
Frequency	Annual
Location	Various
First	2014
Organizer	Independent organizers
Participants	Data engineers, architects, researchers

Contents

Overview
History and Development
Conference Structure and Programs
Keynote Speakers and Notable Presentations
Technical Tracks and Workshops
Community and Industry Impact
Awards and Recognition

DataEngConf is an annual professional conference focused on data engineering, large-scale data systems, distributed computing, and applied data infrastructure. It convenes practitioners, researchers, vendors, and community leaders to present case studies, tools, and best practices drawn from major technology companies and research institutions. Attendees include engineers from hyperscale firms, cloud providers, academic laboratories, and independent startups who engage with topics spanning storage, processing, orchestration, and observability.

Overview

DataEngConf brings together engineers and researchers from organizations such as Google, Amazon, Microsoft, Facebook, Netflix, Uber, Spotify, Airbnb, LinkedIn, Twitter to share operational lessons in data pipelines and platform engineering. The program typically features talks referencing tools and projects from Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink, Kubernetes, Docker, PostgreSQL, MySQL, Cassandra, MongoDB, Redis. Sponsors have included cloud vendors like Google Cloud Platform, Amazon Web Services, Microsoft Azure, and infrastructure vendors such as Databricks, Confluent, Snowflake, Cloudera, Hortonworks. The audience spans practitioners from enterprises including Goldman Sachs, JPMorgan Chase, Capital One, Walmart, Target Corporation, and research groups from MIT, Stanford University, University of California, Berkeley, Carnegie Mellon University, University of Washington.

History and Development

The conference emerged in the mid-2010s alongside the maturation of cloud computing and stream processing, contemporaneous with developments at Facebook on distributed storage and at Twitter on real-time analytics. Early iterations reflected lessons from projects at Yahoo!, LinkedIn, Netflix, and research output from UC Berkeley's AMPLab and the UC Berkeley RISELab. Over time, the program incorporated case studies from financial firms like Goldman Sachs and technology-driven retailers such as Amazon. The evolution paralleled standards work originating in institutions such as IETF, innovation from Linux Foundation projects, and academic conferences including SIGMOD, VLDB, ICDE, Usenix FAST.

Conference Structure and Programs

Typical formats include keynote addresses, technical talks, panel discussions, lightning talks, poster sessions, and vendor villages populated by companies like Confluent, Databricks, Snowflake, HashiCorp, Elastic, Splunk. Tutorials and training are often led by practitioners affiliated with Uber Technologies, Airbnb, Spotify, Netflix, Dropbox, and research labs such as Microsoft Research and Google Research. Community-driven meetups run alongside official programming, supported by groups like Apache Software Foundation, Linux Foundation, and regional meetups tied to universities such as Harvard University and Princeton University.

Keynote Speakers and Notable Presentations

Keynotes have historically featured engineering leaders and researchers from Google Research, Amazon Web Services, Microsoft Azure, Facebook, Netflix, LinkedIn, Uber Technologies, Stripe, Snowflake, and academic figures from MIT, Stanford University, UC Berkeley, Carnegie Mellon University. Notable presentations have highlighted work on stream processing from teams at LinkedIn and Confluent, batch orchestration innovations from Databricks and Airbnb, and data governance efforts at Goldman Sachs and Capital One. Panels have included representatives from regulatory bodies and standards contributors associated with IETF and ISO where appropriate for interoperability discussions.

Technical Tracks and Workshops

Tracks commonly include stream processing, batch processing, data storage, data governance, observability, machine learning infrastructure, and site reliability practices. Workshops are often hands-on sessions using stacks built around Apache Spark, Apache Flink, Apache Kafka, Presto, Trino, Kubernetes, Terraform, Apache Airflow, dbt. Specialized tutorials have been run by practitioners from Google Cloud Platform, Amazon Web Services, Microsoft Azure, and engineering teams from Netflix and Uber Technologies to demonstrate deployment patterns, testing strategies, and failure injection methods influenced by Chaos engineering practices pioneered by Netflix's Chaos Monkey.

Community and Industry Impact

The conference acts as a nexus between practitioner communities and vendor ecosystems, influencing open-source project roadmaps at Apache Software Foundation projects and commercial product development at Databricks, Confluent, Snowflake, AWS, Google Cloud Platform, Microsoft Azure. It has catalyzed collaborations among academic groups at MIT, Stanford University, UC Berkeley and industry R&D teams at Google Research, Microsoft Research, Facebook AI Research. Community organizing often involves local chapters connected to Meetup chapters and professional societies such as ACM and IEEE.

Awards and Recognition

DataEngConf has recognized exemplary engineering work through awards and spotlight sessions, highlighting recipients from companies like Netflix, LinkedIn, Google, Airbnb, Uber Technologies, and academic awards showcasing work from Stanford University and MIT. Industry press coverage from outlets such as TechCrunch, The Verge, Wired, The Register and citation in research published at SIGMOD and VLDB have amplified notable presentations. Local tech ecosystems in cities such as San Francisco, New York City, London, Berlin, Bangalore have hosted regional editions, furthering recognition among engineering communities.

Category:Conferences in data engineering