Generated by GPT-5-mini| seccomp | |
|---|---|
| Name | seccomp |
| Developer | Linus Torvalds, Android (operating system), Google (company), The Linux Foundation |
| Initial release | 2005 |
| Operating system | Linux kernel |
| Programming language | C (programming language) |
| License | GNU General Public License |
| Website | Kernel.org |
seccomp
seccomp is a Linux kernel facility that restricts the set of system calls a process may invoke, providing a sandboxing primitive for reducing attack surface. It enables confinement of untrusted code by allowing processes to enter a mode where only a narrow, predeclared set of system calls remain available, often combined with Linux namespaces, cgroups, and other Linux kernel mechanisms to form layered isolation. Adopted widely in Android (operating system), container runtimes such as Docker (software), and server software, seccomp is part of modern Linux kernel security practices.
Seccomp operates as a lightweight syscall filtering layer integrated into the Linux kernel that can be engaged by a process to limit its own syscall repertoire. The facility originally provided a binary on/off confinement model and later evolved to support Berkeley Packet Filter (BPF)-based policies, enabling fine-grained allow/deny decisions per syscall. In practical deployments seccomp is used alongside other kernel features such as AppArmor, SELinux, namespaces (Linux), and cgroups to implement defense-in-depth for services like nginx, Docker (software), and Firefox.
The original seccomp mode was merged into the Linux kernel mainline by Linus Torvalds in 2005, inspired by the need to reduce kernel interaction for sandboxed processes. Subsequent development accelerated through contributions from organizations such as Google (company), Red Hat, Canonical (company), and The Linux Foundation, which integrated a BPF-based filtering mechanism around 2013 to enable expressive policies. Work on tooling and default profiles emerged from projects like Docker (software), Chromium (web browser), and Android (operating system), while academic and industry research from groups associated with University of California, Berkeley, MIT, and Microsoft Research influenced enhancements and threat-model analysis.
Seccomp provides two principal modes: the original strict mode and the modern seccomp-bpf mode. The strict mode restricts a process to a handful of safe syscalls; the BPF mode leverages the in-kernel Berkeley Packet Filter engine to run small programs that inspect syscall numbers and arguments, returning actions such as terminate, errno return, or allow. Kernel components interacting with seccomp include syscall dispatch paths, the BPF verifier, and task_struct fields in the scheduler; contributions and reviews have been coordinated via Kernel.org and Linux Kernel Mailing List. Policy composition often interacts with the syscall ABI differences of architectures like x86_64, ARM, and PowerPC, requiring attention to syscall numbering and calling conventions.
Processes enable seccomp using the prctl syscall or the seccomp syscall, specifying either strict mode or loading a BPF program. Administrators and developers create policies using helper libraries and tools from projects like libseccomp, Docker (software), and systemd. Typical configuration workflows involve assembling an allowlist of syscalls for services such as sshd, postgresql, or javascript engines, testing under controlled environments like QEMU, and deploying via orchestration platforms including Kubernetes and OpenStack. Distribution packaging and cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure often recommend or supply default seccomp profiles.
Seccomp substantially reduces attack surface by preventing exploitation paths that rely on invoking unexpected syscalls, thereby mitigating classes of vulnerabilities exploited in projects like OpenSSL or glibc extrapolated attacks. However, it is not a complete sandbox: kernel vulnerabilities, exploitable syscalls in the allowlist, and flaws in the BPF verifier or syscall emulation can undermine guarantees. Compatibility issues arise with auditing systems like auditd and with debuggers such as gdb; some applications require dynamic syscalls for plugins or JIT compilers (e.g., V8 (JavaScript engine), LLVM), complicating policy authoring. Threat models must account for escape vectors including kernel bugs reported in advisories by vendors like Red Hat and Debian.
The primary user-space interface is provided by libseccomp, which exposes a convenient API for constructing seccomp-bpf policies and is maintained by contributors from Google (company), Red Hat, and independent developers. Container runtimes implement seccomp integration: Docker (software) ships default profiles, CRI-O and containerd support profile management, and orchestration controllers in Kubernetes accept seccomp annotations. Auditing and policy generation tools include projects such as auditd-based tracers, profile generators in sysdig and BPF Compiler Collection, and testing frameworks used by vendors like Canonical (company). Kernel-level work on seccomp and BPF has been coordinated through Linux Kernel Mailing List and hosted on Kernel.org.
Common use cases include restricting daemon processes like nginx and httpd in hosting stacks, sandboxing browsers such as Chromium (web browser) and Firefox, and constraining language runtimes like Node.js and Go (programming language) servers. Container platforms employ seccomp to harden multi-tenant deployments on services like Docker Hub and Red Hat OpenShift, while mobile platforms such as Android (operating system) use seccomp to protect system services. Research prototypes and exploit mitigations in academic venues such as USENIX and IEEE Symposium on Security and Privacy examine seccomp combinations with control-flow integrity and hardware features like Intel Software Guard Extensions for enhanced confinement.
Category:Linux kernel security