LLMpediaThe first transparent, open encyclopedia generated by LLMs

pg_cron

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: PL/Python Hop 4
Expansion Funnel Raw 56 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted56
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
pg_cron
Namepg_cron
Developerpganalyze, Crunchy Data, David E. Wheeler
RepositoryGitHub
Released2018
LicenseMIT

pg_cron

pg_cron is an extension for PostgreSQL that provides cron-based job scheduling inside the database, allowing users to run periodic SQL commands and maintenance tasks. It integrates scheduling semantics similar to the Unix cron daemon with PostgreSQL's process model and background worker framework, enabling timed execution of SQL statements in the context of the server. The project has been used alongside tools and organizations such as TimescaleDB, Crunchy Data, pganalyze, AWS, and Google Cloud Platform in production environments.

Overview

pg_cron implements a scheduler as a PostgreSQL background worker that executes SQL on a configurable interval, influenced by the architecture of PostgreSQL and informed by similar systems like pgAgent, pg_timetable, and external schedulers such as systemd timers and Kubernetes CronJobs. It exposes SQL-defined jobs and uses PostgreSQL catalog objects to persist job metadata, supporting transactional semantics familiar to users of PostgreSQL releases. The extension has been maintained and packaged by several vendors, including Debian, Ubuntu, Red Hat, CentOS, and Amazon Linux repositories.

Features and Architecture

pg_cron provides cron expression parsing, job catalog tables, and a background worker process that forks worker subprocesses to run scheduled SQL statements. Architecturally, it leverages the PostgreSQL background worker API introduced and evolved across versions of PostgreSQL, interacts with the shared memory and WAL subsystems, and integrates with authentication via PostgreSQL roles and connection management. Key features include support for standard five-field cron syntax, job history logging in catalog tables, support for database-scoped scheduling, and hooks for extensions such as pg_stat_statements and pgbench for monitoring and benchmarking. Implementations consider compatibility with cloud offerings like Amazon RDS, Google Cloud SQL, and Microsoft Azure Database for PostgreSQL, each of which imposes operational constraints on background workers and shared libraries.

Installation and Configuration

Installation normally requires building the extension from source or installing a packaged library for the target distribution; binary packages exist for Debian, Ubuntu, Fedora, and CentOS. After installing the shared object into the PostgreSQL server's library directory, the extension is created in the target database via CREATE EXTENSION; configuration is controlled by shared_preload_libraries in postgresql.conf and by extension-specific GUCs. Administrators must consider integration with system-level packaging from Debian Project and Red Hat, Inc., cloud-managed workflows for Amazon Web Services and Google Cloud Platform, and container orchestration with Docker and Kubernetes. Upgrading between versions requires attention to compatibility notes in release announcements from maintainers such as David E. Wheeler and vendor packaging teams like Crunchy Data.

Usage and Examples

Jobs are defined with SQL commands that insert rows into pg_cron's job catalog (for example, via a CREATE JOB interface exposed by the extension) or through provided SQL functions. Typical usage patterns include scheduling VACUUM and ANALYZE for tables referenced by pg_stat_user_tables, running aggregation queries for data warehouses powered by TimescaleDB or Citus, and invoking maintenance tasks coordinated with tools like Ansible, Puppet, or Chef for deployments. Examples include scheduling nightly maintenance aligned with backup procedures such as those performed by pgBackRest, logical replication maintenance with pglogical, and periodic metric extraction for observability systems like Prometheus and Grafana dashboards hosted by Grafana Labs.

Security and Permissions

pg_cron runs SQL under the role that owns a given job, integrating with PostgreSQL's role and privilege model including features like role membership and SECURITY DEFINER semantics used in extensions such as PostGIS and pgcrypto. Administrators must manage access carefully, using GRANT/REVOKE on the extension's control functions and relying on PostgreSQL authentication methods such as SCRAM-SHA-256 and client certificates supported by OpenSSL and LibPQ. In managed environments like Amazon RDS and Google Cloud SQL, superuser privileges may be restricted, requiring vendors' approved mechanisms or alternative scheduling solutions. Auditing can be performed with extensions such as pgaudit and integrated with centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana).

Performance and Limitations

Because pg_cron executes SQL inside the database process, heavy or long-running jobs can contend with foreground queries for CPU, memory, and I/O, similar to concerns when running complex queries via PL/pgSQL functions or ad hoc SQL. Administrators should coordinate cron schedules with maintenance windows, use resource governance features provided by platforms like PGPool-II or PgBouncer for connection pooling, and monitor impact with pg_stat_activity and pg_stat_statements. Limitations include dependency on PostgreSQL version-specific background worker APIs, restricted use in some managed database services, and the lack of built-in distributed coordination for multi-node clusters such as Patroni-managed HA setups or Postgres-XL deployments.

Compatibility and Alternatives

pg_cron is compatible with many PostgreSQL major releases but requires careful testing across versions and vendor builds provided by EnterpriseDB, Crunchy Data, EDB Postgres distributions, and cloud providers. Alternatives include external schedulers and job managers such as pgAgent, Apache Airflow, Kubernetes CronJobs, systemd timers, and orchestrators like Ansible or Jenkins pipelines. For time-series workloads, tools like TimescaleDB’s internal policies and Prometheus alertmanager-driven tasks may replace some pg_cron use cases. Choice among alternatives should weigh operational constraints imposed by service providers such as Amazon RDS, Google Cloud SQL, and Microsoft Azure.

Category:PostgreSQL extensions