Generated by GPT-5-mini| Chandy–Lamport algorithm | |
|---|---|
| Name | Chandy–Lamport algorithm |
| Inventors | K. Mani Chandy; Leslie Lamport |
| Introduced | 1985 |
| Field | Distributed computing; Fault tolerance |
| Purpose | Global state snapshot in distributed systems |
Chandy–Lamport algorithm The Chandy–Lamport algorithm is a distributed snapshot algorithm introduced by K. Mani Chandy and Leslie Lamport that records a consistent global state of an asynchronous distributed system without stopping the system. It operates in environments with processes and unidirectional communication channels and is widely cited in literature on distributed systems, fault tolerance, and consensus.
The algorithm assumes a set of processes and FIFO communication channels as in models used by researchers such as Leslie Lamport and K. Mani Chandy, and builds on prior work in distributed algorithms studied alongside topics treated by pioneers like Edsger Dijkstra, Nancy Lynch, and Fred Schneider. It uses marker messages to delineate local state recording and channel state capture, a technique related to snapshot notions referenced in work by Burton Bloom, Maurice Herlihy, and Michael Fischer. The snapshot captures local process states and the set of messages in transit, concepts that appear in discussions by Lamport in papers on logical clocks and in textbooks by Andrew Tanenbaum, as well as treatments by Silberschatz, Galvin, and Gagne.
The algorithm begins when an initiator process records its local state and sends a special marker message on all outgoing channels, a step similar in coordination intent to actions in coordination problems addressed by Leslie Lamport and Kenneth Birman. When a process receives a marker for the first time it records its local state and sends markers on its outgoing channels; subsequent markers on other incoming channels signal that the receiving process should record the channel state as the sequence of messages received after its local state recording, a mechanism conceptually adjacent to message logging themes explored by Rodrigo Rodrigues and Fred B. Schneider. The marker propagation continues until every process has recorded local state and every channel has been marked, yielding a consistent snapshot comparable to checkpoints discussed in papers from the fault tolerance community, including works by Brian Randell and Marshall T. Harvey.
Correctness arguments use happens-before relations introduced by Leslie Lamport and partial order reasoning akin to models used by Nancy Lynch and Barbara Liskov in distributed systems theory. Proofs show that the snapshot produced corresponds to a global state that could have occurred during an execution consistent with the recorded local states and in-transit messages, leveraging causality arguments similar to those in Lamport’s logical clocks and vector clock analyses by Colin Fidge and Friedemann Mattern. Formal verification efforts have been undertaken using temporal logic frameworks similar to those employed by Amir Pnueli and Joseph Sifakis, and model checking strategies analogous to work by Edmund Clarke and E. Allen Emerson.
The Chandy–Lamport algorithm requires each process to send marker messages on all outgoing channels once per snapshot, producing message complexity proportional to the number of channels, reminiscent of communication bounds analyzed in research by Leslie Lamport and Nancy Lynch. Space overhead includes storing local state and in-transit messages until the snapshot completes, considerations that parallel checkpointing trade-offs investigated by Jeffrey Ullman and Barbara Liskov. Time to complete a snapshot is bounded by the longest causal path under FIFO assumptions, an analysis related to latency considerations studied by Van Jacobson and Sally Floyd in networking contexts and to synchrony bounds discussed by Hugo Krawczyk in distributed timing research.
Extensions relax FIFO channel assumptions or adapt the algorithm for non-FIFO networks, building on ideas from vector clock variations developed by Colin Fidge, Friedemann Mattern, and Leslie Lamport, and on causal message ordering techniques explored by Ken Birman and Robbert van Renesse. Other variants integrate snapshotting with checkpoint–restart mechanisms as in systems influenced by Brian Randell and Algirdas Avizienis, or combine with garbage collection and log trimming methods akin to approaches from Diego Ongaro and John Ousterhout. Research also adapts the snapshot concept to replicated state machines and consensus protocols studied by Leslie Lamport in Paxos and by Diego Ongaro and John Ousterhout in Raft.
The algorithm is applied in debugging, checkpointing, resource accounting, and distributed monitoring in systems inspired by projects at IBM Research, Microsoft Research, and academic implementations from MIT, Carnegie Mellon University, and University of California, Berkeley. Practical implementations appear in distributed databases and stream processing frameworks influenced by Google’s Spanner, Apache Kafka, and Apache Flink, and in fault-tolerant middleware stemming from work by Ken Birman and Nancy Lynch. Variants are embedded in cloud infrastructure and container orchestration tools whose designs reference techniques from Leslie Lamport and Peter Deutsch, and in formal toolchains using model checkers from the families of SPIN and SMV pioneered by Gerard Holzmann and Edmund Clarke.
Category:Distributed algorithms