A64FX — LLMpedia

A64FX
Name	A64FX
Designer	Fujitsu
Manufacturer	Fujitsu
Architecture	ARMv8.2-A SVE
Cores	48+2
Process	7 nm
Release	2019
Applications	Supercomputing, HPC, AI

Contents

Design and Architecture
Performance and Benchmarks
Software and Ecosystem
Implementations and Systems
Development History and Timeline

A64FX is a high-performance scalar/vector processor designed for scientific computing and artificial intelligence by Fujitsu and manufactured for use in supercomputers and accelerated systems. It implements the ARMv8.2-A instruction set with Scalable Vector Extension and focuses on memory bandwidth, energy efficiency, and matrix computation. The processor has been deployed in flagship systems and has attracted interest from national laboratories, research centers, and industry consortia.

Design and Architecture

The chip was developed by Fujitsu and integrates a multi-core cluster using the ARM family of designs and the Scalable Vector Extension specification developed by ARM Limited. It implements a 48-core compute complex plus 2 assistant cores in a CMP topology similar to designs seen in Intel and AMD server processors, while emphasizing vector throughput similar to the NEON (SIMD) lineage. The processor uses a 7 nm process node comparable to nodes chosen by TSMC and Samsung Electronics for contemporary datacenter chips. Memory architecture includes HBM2e stacks sourced through partnerships like those between Fujitsu and memory suppliers in the JEDEC ecosystem, facilitating high bandwidth comparable to specialized accelerators from NVIDIA and Intel Xeon Phi families. On-chip interconnect and cache coherency borrow concepts employed by ARM Limited partners, and the chip targets workloads popularized by institutions such as Lawrence Livermore National Laboratory, Riken, and Los Alamos National Laboratory.

Performance and Benchmarks

Published benchmarks for the processor have been compared on workloads similar to those run on systems like Summit (supercomputer), Sierra (supercomputer), and the K computer. Performance metrics emphasize double-precision floating-point from scientific codes used by teams at Argonne National Laboratory, Oak Ridge National Laboratory, and European Centre for Medium-Range Weather Forecasts. LINPACK and HPCG-style results reported by organizations including Top500 and Green500 reflect the design's balance of energy efficiency and throughput, with performance per watt often contrasted against offerings from NVIDIA GPUs, AMD EPYC servers, and Intel Xeon lines. Real-world application benchmarks include climate modeling codes used by Met Office groups, quantum chemistry packages employed at Max Planck Institute labs, and machine learning frameworks used by teams at Google and Microsoft Research for comparison.

Software and Ecosystem

Software support for the processor resides within an ecosystem involving SUSE, Red Hat, and other Linux distributors, as well as compiler and toolchain vendors like GCC, LLVM Project, GNU Compiler Collection, and vendors such as Cray Research/Cray Inc. who have historically provided compilers and MPI stacks. Libraries for linear algebra and tensor operations draw on implementations by projects such as OpenBLAS, BLIS, and vendor-tuned stacks used by groups like NVIDIA and Intel Math Kernel Library for cross-comparison. Parallel programming and runtime environments include OpenMP, MPI, and community efforts from HPC Advisory Council and European HPC Joint Undertaking participants. AI and deep learning frameworks such as TensorFlow, PyTorch, and ONNX have been ported or optimized by research teams from Riken and university consortia, while performance analysis tools from LLNL and Sandia National Laboratories have been used for profiling.

Implementations and Systems

The processor was integrated into systems by original equipment manufacturers and national projects led by organizations like Riken and commercial integrators including Fujitsu and partners in the European Processor Initiative landscape. Notable installations include national supercomputers hosted at research centers similar to installations at RIKEN Center for Computational Science, National Institute of Advanced Industrial Science and Technology, and regional HPC centers coordinated with PRACE members. System-level designs incorporate high-speed interconnects like those in InfiniBand deployments and exascale roadmaps discussed by US Department of Energy labs. Vendors and centers running installations collaborate with academic groups at institutions such as University of Tokyo, Kyoto University, Tokyo Institute of Technology, and international partners at CERN and Max Planck Society for domain science.

Development History and Timeline

Design work originated within projects led by Fujitsu and academic partners influenced by prior initiatives such as the K computer project. Public announcements and chip tape-outs occurred in phases that mirrored collaborations between Fujitsu and national research agencies including Riken; these phases align with procurement cycles observed in programs at Japan Science and Technology Agency and international research funding bodies such as the European Commission. Subsequent system deployments and benchmark publications appeared alongside global supercomputing events like ISC High Performance and SC Conference presentations. Ongoing development has involved software porting and performance tuning coordinated with compiler teams at ARM Limited, contributions from open-source communities centered around GitHub, and cooperative evaluations by labs such as Los Alamos National Laboratory and Oak Ridge National Laboratory.

Category:ARM-based microprocessors