Graph500 — LLMpedia

Graph500
Name	Graph500
Developer	Dongarra, Jack, David A. Bader, Richard Vuduc, and others
Released	2010
Genre	Benchmark
Website	https://graph500.org

Contents

Overview
Benchmark specifications
Performance metrics
List rankings
Significance and impact

Graph500. The Graph500 is a benchmark designed to measure the performance of supercomputers on data-intensive applications, particularly those involving large-scale graph analysis. It was introduced in 2010 by a consortium of researchers from academia, industry, and government laboratories to complement traditional benchmarks like the LINPACK used for the TOP500 list. The benchmark focuses on problems such as breadth-first search on massive, synthetically generated graphs, reflecting the computational needs of fields like cybersecurity, social network analysis, and medical informatics.

Overview

The creation of the Graph500 was driven by the recognition that the computational landscape for high-performance computing was shifting towards data-intensive workloads not adequately measured by existing metrics. Founding contributors included prominent figures from institutions like the University of Tennessee, Georgia Institute of Technology, and Lawrence Livermore National Laboratory. It was officially launched at the Supercomputing Conference (SC10) in New Orleans. The benchmark's primary goal is to guide the design of future systems for applications in areas such as national security, scientific research, and business intelligence, where graph traversal and analysis are fundamental operations.

Benchmark specifications

The benchmark suite defines a scalable data generator that creates an edge list for a large, undirected graph using a Kronecker graph generator, mimicking properties found in real-world networks. The core computational kernel is a distributed breadth-first search starting from a single source vertex, which must be performed on the entire graph. The benchmark specifies multiple problem sizes, or scales, ranging from toy graphs to those with trillions of edges, allowing it to stress systems of varying capabilities. The rules are maintained by a steering committee with representatives from organizations like the National Science Foundation, Intel, and Cray.

Performance metrics

The primary performance metric is "traversed edges per second" (TEPS), which quantifies the rate at which a system can explore the graph structure during the search. A validation step ensures the correctness of the generated parent array from the search. The final reported performance is the harmonic mean of TEPS rates across multiple search iterations. This metric emphasizes memory and network performance over pure FLOPS, distinguishing it from benchmarks like the High Performance Conjugate Gradients benchmark. Results are often analyzed in the context of architectural features, such as NUMA design and interconnection network efficiency.

List rankings

The Graph500 list is published biannually, coinciding with the major International Supercomputing Conference and the Supercomputing Conference. The list ranks systems based on their TEPS performance, categorized by the problem scale they attempt. Notable top-ranked systems over the years have included machines from Fujitsu, such as the Fugaku supercomputer, and clusters built with NVIDIA accelerators and InfiniBand interconnects. The list provides a complementary view to the TOP500, often highlighting different architectural optimizations and revealing leaders in data-intensive computing from institutions like RIKEN and the Lawrence Berkeley National Laboratory.

Significance and impact

The Graph500 has significantly influenced the design and procurement of supercomputing systems by highlighting the importance of data movement and irregular memory access patterns. It has spurred research into new algorithms, programming models like the Message Passing Interface, and hardware architectures optimized for graph workloads. The benchmark has been adopted by major vendors and research agencies worldwide, including the United States Department of Energy and the European Commission, to evaluate systems for future workloads in domains like machine learning and network science. Its ongoing development continues to reflect the evolving needs of the big data and artificial intelligence communities.

Category:Computer benchmarks Category:Supercomputing Category:High-performance computing