This article was accepted into the corpus but its outbound wikilinks were never NER-processed — typical at the deepest BFS hop or when the run's entity cap was reached. No expansion funnel to show.
| ETL | |
|---|---|
| Name | ETL |
| Acronym | ETL |
| Domain | Data integration |
| Originated | 1970s |
| Components | Extract, Transform, Load |
ETL
ETL is a data integration pattern that extracts data from source systems, transforms it for analysis, and loads it into target systems. It underpins data warehousing projects used by organizations such as IBM, Oracle Corporation, Microsoft, SAP SE, and Amazon (company), and supports analytics executed with platforms like Tableau Software, Qlik, SAS Institute, Palantir Technologies, and Snowflake Inc.. ETL workflows interact with transactional systems like Oracle Database, Microsoft SQL Server, MySQL, PostgreSQL, and SAP HANA as well as big data platforms such as Apache Hadoop, Apache Spark, Cloudera, Hortonworks, and Databricks.
ETL performs three core functions across enterprise stacks including source systems like Salesforce, Workday, ServiceNow, SAP ERP, and Oracle E-Business Suite; transformation engines from vendors such as Informatica, Talend (company), and IBM DataStage; and targets like Amazon Redshift, Google BigQuery, Microsoft Azure Synapse Analytics, Teradata, and Vertica. ETL supports reporting with tools used by Oracle Business Intelligence, IBM Cognos Analytics, Microsoft Power BI, MicroStrategy, and Looker (company), enabling downstream use cases in organizations such as Walmart, JPMorgan Chase, Pfizer, ExxonMobil, and UnitedHealth Group.
Early ETL approaches evolved alongside data warehousing initiatives championed by figures and organizations including Bill Inmon, Ralph Kimball, Teradata Corporation, and IBM. The rise of relational databases like Ingres, Oracle Database, and Informix in the 1980s and 1990s shaped ETL practices. The 2000s saw commercial ETL suites from Informatica, Microsoft SQL Server Integration Services (SSIS), and IBM gain traction, while open-source projects such as Pentaho and Talend broadened adoption. The advent of cloud platforms—Amazon Web Services, Google Cloud Platform, Microsoft Azure—and distributed computing frameworks like Apache Hadoop and Apache Spark shifted ETL toward ELT, streaming, and real-time variants adopted by companies including Netflix, Uber Technologies, Airbnb, Spotify, and LinkedIn.
Typical ETL architectures combine connectors for sources such as Salesforce, Oracle Database, SAP HANA, and MongoDB; transformation layers using engines like Apache Spark, Talend, Informatica PowerCenter, and DataStage; and destinations such as Snowflake Inc., Amazon Redshift, Google BigQuery, and Azure Synapse Analytics. Supporting components include metadata repositories developed with databases like PostgreSQL, MySQL, and Oracle Database; workflow schedulers exemplified by Apache Airflow, Control-M, and Cron; and monitoring systems from Datadog, Splunk, and New Relic. Security is addressed through identity providers like Okta, Microsoft Entra ID, Ping Identity, and encryption technologies standardized by bodies such as IETF.
ETL workflows begin with extraction from systems like Salesforce, SAP ERP, Oracle E-Business Suite, and Google Analytics, proceed through transformation steps implemented via libraries and engines such as Apache Spark, Pandas (software), NumPy, and Scala (programming language), and finish by loading into warehouses like Teradata, Snowflake Inc., Amazon Redshift, and Google BigQuery. Orchestration tools such as Apache Airflow, Prefect, Control-M, and AWS Step Functions manage dependencies, while message brokers like Apache Kafka, RabbitMQ, and Amazon Kinesis support streaming ingestion used by firms like Confluent, LinkedIn, and Netflix. Data quality frameworks from Great Expectations and master data management systems by Reltio and Informatica help enforce validity, lineage, and governance favored by regulators such as U.S. Securities and Exchange Commission in financial contexts served by Goldman Sachs and Morgan Stanley.
Popular commercial tools include Informatica, IBM DataStage, Talend (company), Microsoft SSIS, and cloud-native services like AWS Glue, Google Cloud Dataflow, Azure Data Factory, and Snowflake Inc. partners. Open-source projects relevant to ETL include Apache NiFi, Apache Airflow, Apache Spark, Apache Beam, Pentaho, and Kettle (Pentaho Data Integration). Programming languages used to build custom ETL pipelines include Python (programming language), Java (programming language), Scala (programming language), and SQL (Structured Query Language). Integration accelerators and connectors are provided by vendors such as SAP SE, Oracle Corporation, Salesforce, and Workday.
Designers balance latency, throughput, and cost constraints when selecting technologies from Amazon Web Services, Google Cloud Platform, Microsoft Azure, or on-premises vendors like IBM and Teradata Corporation. Best practices include cataloging metadata with tools such as Alation, Collibra, and Apache Atlas; implementing observability via Datadog, Splunk, and Prometheus; enforcing access controls through Okta, Microsoft Entra ID, and AWS Identity and Access Management; and automating testing with frameworks like Great Expectations and CI/CD systems by Jenkins, GitLab, and GitHub Actions. Architectural patterns advocated by practitioners including Bill Inmon and Ralph Kimball inform modeling choices in sectors served by Citi, HSBC, Pfizer, and Johnson & Johnson.
ETL is applied in scenarios such as enterprise reporting for corporations like Walmart, Target Corporation, and Costco Wholesale Corporation; risk analytics at JPMorgan Chase, Goldman Sachs, and Citigroup; customer 360 initiatives at Salesforce, Amazon (company), and Uber Technologies; clinical data integration in institutions like Mayo Clinic and Johns Hopkins Hospital; marketing attribution for Procter & Gamble, Unilever, and Coca-Cola Company; and IoT telemetry consolidation for General Electric, Siemens, and Bosch. Industry-specific deployments appear in aviation with Delta Air Lines, American Airlines Group, and United Airlines, and in telecommunications with AT&T, Verizon Communications, and Deutsche Telekom.
Category:Data integration