Completely Fair Scheduler

Completely Fair Scheduler
Name	Completely Fair Scheduler
Author	Ingo Molnár
Introduced	2007
Kernel	Linux
Licence	GNU General Public License

Contents

Background
Design and algorithm
Implementation in Linux
Performance and evaluation
Criticisms and limitations
Alternatives and related schedulers

Completely Fair Scheduler The Completely Fair Scheduler is a process scheduler introduced into the Linux kernel to provide fair CPU time distribution among tasks. It was developed for the Linux kernel development community with influences from classical scheduling research represented at venues such as the ACM SIGOPS and USENIX conferences. The scheduler aimed to replace earlier policies like the O(1) scheduler and to integrate with features maintained by organizations such as the Linux Foundation and companies like Red Hat.

Background

The project was initiated by Ingo Molnár while contributing to the Linux kernel development tree maintained by Linus Torvalds and collaborators from distributions including Debian and Fedora. Early motivation drew on literature from researchers affiliated with institutions like MIT, Stanford University, and University of California, Berkeley that explored fairness and responsiveness in multitasking systems. Prior dominant policies included designs from the Unix tradition and implementations influenced by commercial systems such as Solaris and Windows NT. Community review occurred on mailing lists such as the Linux Kernel Mailing List and within firms including IBM and Intel which maintained contributions to scheduler behavior for server and desktop workloads.

Design and algorithm

The scheduler models CPU access as a fair-share resource by tracking a virtual runtime metric derived from concepts from academic work at ACM and IEEE conferences. It maps tasks onto a red-black tree structure, an algorithmic choice influenced by data structures researched at universities like Princeton University and University of Cambridge. Key elements include per-task weights corresponding to priorities used in systems developed by companies such as Google for datacenter scheduling and by projects at Red Hat for enterprise distributions. The algorithm integrates notions similar to weighted fair queuing discussed in papers from USENIX and ACM SIGCOMM proceedings. Scheduling classes and policy hooks allow coordination with control groups originally specified by contributors from Google and standardized by the Linux Foundation.

Implementation in Linux

Integration required changes to core subsystems coordinated via patches submitted to the Linux Kernel Mailing List and maintained in repositories mirrored on platforms such as GitHub and GitLab by organizations like Canonical and SUSE. The implementation interacts with subsystems including the CPU frequency scaling frameworks used by Intel and AMD processors, NUMA support influenced by work at Mercury Research and power management components developed by ACPI contributors. Runtime configuration interfaces were exposed through syscalls and utilities provided in distributions such as Ubuntu and Red Hat Enterprise Linux, and tooling in systemd and BusyBox. Backward compatibility concerns led to coordination with maintainers of other kernel subsystems like the I/O scheduler and memory management contributors from Google ChromeOS and academic groups.

Performance and evaluation

Empirical evaluations compared throughput and latency against predecessors used in Fedora and Ubuntu desktops and against server-tuned policies applied by vendors like Red Hat and cloud providers including Amazon Web Services and Microsoft Azure. Benchmarks reported in conference papers from USENIX and ACM indicated improvements in fairness and interactive responsiveness on hardware by Intel and AMD, and on ARM platforms produced by ARM Holdings. Real-world reports from projects such as Kubernetes and OpenStack described scheduler behavior under mixed workloads, while academic evaluations from institutions like ETH Zurich and Carnegie Mellon University analyzed scheduling under virtualization environments employed by VMware and Xen.

Criticisms and limitations

Critiques arose from maintainers and users in communities such as Debian and enterprises like Red Hat who observed corner cases affecting latency-sensitive applications used in telecommunications and high-frequency trading environments maintained by firms like Jane Street. Some argued that algorithmic overhead and red-black tree operations introduced costs on many-core machines from Intel and ARM vendors, prompting optimization work by companies like Google and research groups at University of Illinois Urbana-Champaign. Interaction with scheduler-affine workloads, real-time patches maintained by projects such as PREEMPT_RT, and integration with container orchestration stacks like Docker and Kubernetes revealed limitations that spurred alternative patches and tuning guidelines documented by vendors including SUSE.

Other approaches include the O(1) scheduler previously used in mainline Linux, vendor-maintained options like the BFQ I/O-aware scheduler integrated by Con Kolivas-influenced projects, and real-time systems such as those supported by the PREEMPT_RT patch set. Research and commercial schedulers developed at institutions and companies such as Google, Microsoft Research, IBM Research, and Oracle Corporation present alternatives for datacenter and real-time environments. Related scheduling strategies appear in operating systems like FreeBSD, NetBSD, OpenBSD, Solaris, and Windows NT, and in workload managers used by high-performance computing centers at organizations like CERN and national labs including Los Alamos National Laboratory.

Category:Linux kernel schedulers