LLMpediaThe first transparent, open encyclopedia generated by LLMs

Azure Data Catalog

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Avro Hop 4
Expansion Funnel Raw 146 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted146
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Azure Data Catalog
NameAzure Data Catalog
DeveloperMicrosoft
Released2015
Discontinued2019 (classic); successor services ongoing
PlatformMicrosoft Azure
GenreData catalog / metadata management

Azure Data Catalog Azure Data Catalog was a cloud-based metadata cataloging service developed by Microsoft to help organizations discover, understand, and consume data assets across enterprise environments. It provided searchable registration of data sources, annotations, and lineage metadata to improve data discovery and governance for analytics and BI workloads. The service integrated with multiple Microsoft products and responded to growing needs from enterprises such as Walmart, Bank of America, Pfizer, General Electric and Procter & Gamble for centrally managed metadata.

Overview

Azure Data Catalog launched as part of the Microsoft Azure platform to address metadata sprawl across on-premises and cloud systems used by organizations like Amazon, Google, IBM, Accenture, Deloitte, KPMG, Ernst & Young, Capgemini, SAP, Oracle Corporation, and Salesforce customers. It aimed to complement services from Tableau Software, Qlik, Looker (company), MicroStrategy, SAS Institute, Teradata, Snowflake Inc., Cloudera, and Hortonworks by focusing on discovery and annotation rather than compute. Enterprises including Citi, JPMorgan Chase, HSBC, Deutsche Bank, and Goldman Sachs explored cataloging solutions to support analytics teams working with Microsoft Power BI, Azure Synapse Analytics, Azure Data Factory, SQL Server, and Azure Blob Storage.

Features

Key features included automated metadata harvesting, user-driven annotations, search and faceted discovery, and support for schema and table-level lineage similar to capabilities sought by teams at Facebook, Twitter, LinkedIn, Netflix, Airbnb, Uber, Lyft, and Snap Inc.. Integration points were available with identity providers such as Active Directory and single sign-on ecosystems used by Adobe Systems, Autodesk, Cisco Systems, Intel, NVIDIA, and VMware. Administrators could assign tags, business glossary terms, and annotations that mirrored taxonomy efforts at institutions like Harvard University, Stanford University, Massachusetts Institute of Technology, Yale University, Princeton University, Columbia University, University of California, Berkeley, and University of Oxford.

Architecture and components

The architecture combined a front-end portal, a registration agent, a metadata store, and connectors. Components aligned with architectures used by cloud vendors such as Amazon Web Services, Google Cloud Platform, and enterprise software stacks from Red Hat, Canonical, SUSE, VMware Tanzu, and IBM Cloud. Connectors enabled discovery from sources including SQL Server, Oracle Database, Teradata, SAP HANA, Salesforce, Azure Data Lake Storage, and Amazon S3, following patterns employed by projects like Apache Hadoop, Apache Spark, Apache Kafka, Apache Hive, Presto, and Apache Flink. The catalog’s metadata index supported queries similar to search infrastructures built with Elasticsearch, Solr, and Lucene.

Deployment and integration

Deployment typically occurred within Microsoft Azure subscriptions and integrated with enterprise CI/CD and data pipelines managed by teams using Azure DevOps, Jenkins, GitHub, Bitbucket, TeamCity, and Bamboo. Integration scenarios included tagging assets consumed by analytics tools such as Microsoft Excel, Power BI, Tableau, QlikView, and Looker Studio to streamline dashboards used by stakeholders at The Walt Disney Company, Comcast, ViacomCBS, Time Warner, and NBCUniversal. Connectors and agents were implemented alongside middleware such as Apache NiFi, Informatica, Talend, Collibra, Alation, and Atlan to populate catalogs and support data governance programs at organizations like Johnson & Johnson, Merck, Novartis, Boeing, and Lockheed Martin.

Security and compliance

Security relied on Azure identity and role management, encryption, auditing, and access control compatible with compliance regimes enforced by regulators and standards bodies like ISO, NIST, HIPAA, GDPR, PCI DSS, SOC 2, and auditors at firms such as PwC and KPMG. Enterprises operating in regulated industries—banks including Barclays and Santander, insurers like AIG and Allianz, and healthcare providers such as Mayo Clinic and Kaiser Permanente—used catalogs to improve visibility while aligning with policy frameworks from OASIS, W3C, and IETF.

Use cases and adoption

Use cases encompassed data discovery for analytics teams at Microsoft, Amazon.com, Google LLC, Facebook, Inc., and Netflix, Inc.; self-service BI for business units at Procter & Gamble, Unilever, Nestlé, and Coca-Cola Company; and data governance initiatives at Siemens, GE Healthcare, Schneider Electric, and Honeywell. Adoption favored enterprises with complex, heterogeneous data estates seeking to reduce duplication, accelerate data science workflows used with Jupyter Notebook, RStudio, Databricks, and support master data management patterns from Informatica MDM and SAP Master Data Governance.

Lifecycle and retirement

Microsoft announced shifts in strategy as successor capabilities were folded into services like Azure Purview (now Microsoft Purview), Azure Data Catalog (formerly classic) retirement policies, and integration with broader governance suites used alongside products from Collibra and Alation. Organizations migrated assets and metadata to newer platforms, following migration approaches similar to software transitions undertaken by IBM when sunsetting products and by Google when evolving services. The lifecycle reflected broader industry consolidation in metadata management observed across vendors including Oracle Corporation, SAP, Teradata Corporation, Snowflake Computing, Cloudera, Inc., and Hortonworks, Inc..

Category:Microsoft Azure