Generated by GPT-5-mini| Apache Sentry | |
|---|---|
| Name | Apache Sentry |
| Developer | Apache Software Foundation |
| Released | 2013 |
| Latest release | 1.6.0 |
| Programming language | Java |
| Operating system | Cross-platform |
| License | Apache License 2.0 |
Apache Sentry Apache Sentry is an authorization module designed to provide fine-grained role-based access control for large-scale Hadoop-based data platforms. Originally developed to enforce policy for Hive and HBase workloads, it integrates with a range of Hadoop ecosystem components to secure metadata, tables, columns, and namespaces. Sentry aimed to centralize authorization decisions, supporting enterprises deploying analytics stacks built on Cloudera, Hortonworks, and other distribution providers.
Sentry was incubated to address enterprise requirements for access control across projects such as Apache Hive, Apache HBase, Apache Impala, and Apache Solr. It provides a centralized policy administration point and a runtime authorization service that can be invoked by services like Apache Oozie and Apache Spark (when integrated) to evaluate permissions against resources. The project operated under the governance of the Apache Software Foundation and collaborated with vendors including Facebook, Yahoo!, Netflix, and eBay during design and testing cycles.
Sentry's architecture separates policy administration, policy storage, and enforcement. Core components include the policy server, which handles runtime checks; the service plugin adapters embedded in services such as Apache Hive and Apache HBase; and the policy backend, often backed by Apache ZooKeeper or a relational store integrated via Apache Derby or MySQL. The authorization flow relies on authentication from systems like Kerberos and integrates with metadata services such as Apache Ranger-aware registries in some deployments. Administrative clients interact with the policy server using a defined protocol; service plugins consult the policy server at query compile or execution time to enforce privileges on databases, tables, views, and columns.
Sentry implements a role-based access control (RBAC) model where administrators create roles and grant privileges to those roles for resources. Privileges cover actions in systems like Apache Hive (SELECT, INSERT, ALTER), Apache HBase (READ, WRITE), and other service-specific operations. Users and groups from directory services such as Active Directory and LDAP are mapped to roles, while mapping rules can reference identity providers like Kerberos principals. The policy language expresses privileges against hierarchical resources (databases, tables, columns, namespaces) enabling least-privilege patterns similar to controls found in OAuth scopes for API access or XACML-style policy frameworks, though Sentry focuses on RBAC semantics and service-aware resource models.
Sentry was engineered to integrate with major components in the Hadoop ecosystem, embedding plugins or hooks into projects such as Apache Hive Metastore, Apache HBase Master, Apache Impala (via vendor bridges), and Apache Spark SQL through connector layers. It interoperates with cluster services including Apache ZooKeeper for coordination and Kerberos for strong authentication. Vendors like Cloudera and Hortonworks incorporated Sentry or offered migration paths from alternative authorization systems such as Apache Ranger. Sentry's compatibility matrix varied by release and by the consumer service version; integrations often required matching policy server and client plugin versions to ensure correct enforcement semantics.
Typical deployments colocated the policy server on dedicated nodes or as part of management tiers alongside Cloudera Manager or Ambari-managed control planes. Administrators use command-line tools or management UIs from distribution vendors to create roles, assign privileges, and map identities from LDAP or Active Directory. High-availability configurations rely on replicated metadata stores and consensus systems like ZooKeeper; backups of policy metadata are performed to relational stores such as MySQL or PostgreSQL. Audit trails are captured by integrating with logging systems like Apache Flume and central collectors used in Splunk or ELK Stack deployments for compliance reporting.
Sentry depends on robust authentication and secure communication channels; thus deployments commonly require Kerberos for principal verification and TLS for RPC between plugins and the policy server. Privilege escalation risks arise if administrative roles are overprovisioned or if identity-to-role mappings are misconfigured with directory services like LDAP or Active Directory. Ensuring consistent enforcement across services requires synchronized plugin versions and careful upgrade procedures in clusters managed by Ambari or Cloudera Manager. For forensic and compliance needs, integration with auditing solutions such as Apache Ranger-audit pipelines, Splunk, or ELK Stack provides record retention and alerting to detect anomalous access patterns.
Category:Apache Software Foundation projects Category:Big data Category:Access control