Fermi (microarchitecture)

Fermi (microarchitecture)
Name	Fermi
Caption	NVIDIA Fermi GPU die (GF100)
Designer	NVIDIA
Architecture	Terascale family
Introduced	2010
Process	40 nm
Cores	up to 512 CUDA cores
Successor	Kepler

Contents

Overview
Architecture and Features
Performance and Efficiency
Implementations and Products
Software and Driver Support
Reception and Impact

Fermi (microarchitecture) Fermi is a GPU microarchitecture developed by NVIDIA and introduced in 2010 as the successor to the Tesla generation and precursor to Kepler. It targeted high-performance graphics and general-purpose computing across markets served by GeForce, Quadro, and Tesla products, bringing enhanced IEEE 754 double-precision support, a new cache hierarchy, and expanded parallelism. Fermi played a central role in accelerating workloads for organizations such as NASA, Lawrence Livermore National Laboratory, and research projects using CUDA and libraries from LAPACK and BLAS ecosystems.

Overview

Fermi was designed by NVIDIA engineers to meet demands from industries exemplified by Oak Ridge National Laboratory, Argonne National Laboratory, and commercial firms like IBM partners and Google's early compute clusters. Announced at an NVIDIA GPU Technology Conference keynote and launched into consumer and professional markets, Fermi emphasized compatibility with CUDA and standards promoted by groups like the OpenACC and the Khronos Group. It sought to improve upon architectural lessons from predecessors used in products sold through channels such as Best Buy and Dell while addressing challenges seen in scientific installations such as CERN and Princeton University compute facilities.

Architecture and Features

Fermi introduced a reorganized streaming multiprocessor (SM) design with up to 32 cores per SM and up to 512 cores on the GF100 die, incorporating features sought by institutions like Lawrence Berkeley National Laboratory and companies like Microsoft Research. The microarchitecture added a unified L1 cache and configurable shared memory inspired by concepts explored at MIT and Stanford University, a global L2 cache, and full IEEE 754 double-precision arithmetic targeting workloads from Oak Ridge National Laboratory and Sandia National Laboratories. Fermi implemented error-correcting code (ECC) support for DDR3 and GDDR5 memory, addressing reliability concerns relevant to National Science Foundation-funded projects and enterprise deployments by Amazon Web Services and Hewlett-Packard. The design included a redesigned instruction set and scheduling logic to support features used in APIs from DirectX and OpenGL and to interoperate with compute frameworks like OpenCL.

Performance and Efficiency

Fermi delivered substantial single-precision throughput improvements for gaming titles developed by studios such as Valve Corporation and Ubisoft, while also providing improved double-precision performance for scientific codes used at Fermilab and JPL. Benchmarks by outlets like AnandTech and Tom's Hardware showed gains in compute-bound kernels for libraries such as cuBLAS, cuFFT, and Thrust compared to preceding architectures used in products sold through Newegg. However, the 40 nm process node constrained power efficiency relative to later designs from AMD and successors from NVIDIA like Kepler, drawing commentary from analysts at Gartner and IHS Markit about performance-per-watt trade-offs in data centers run by Facebook and Yahoo!.

Implementations and Products

Fermi appeared in consumer and professional SKUs including GeForce 400 Series, GeForce 500 Series, Quadro 6000, and Tesla C2050/C2070. System integrators such as Dell, HP, and Lenovo offered Fermi-based workstations and servers used in industries represented by Pixar and Industrial Light & Magic. High-performance computing clusters at institutions like NERSC and University of Tokyo incorporated Tesla boards for parallel workloads in climate modeling and computational chemistry alongside systems from vendors such as Supermicro and Cray. OEM partners including ASUS, MSI, and EVGA released custom-cooled GeForce cards based on the GF100 and GF110 dies.

Software and Driver Support

Fermi was supported by NVIDIA's driver stacks across operating systems such as Windows, Linux, and macOS for certain configurations, and by developer tools including the CUDA Toolkit, Nsight, and profilers used in conjunction with environments like MATLAB and Python libraries in the SciPy ecosystem. Compiler and runtime evolution at NVIDIA ensured compatibility for projects at Stanford University and Berkeley Artificial Intelligence Research using deep learning frameworks that later migrated to TensorFlow and PyTorch—early adopters used Fermi for experimental training. Community resources like Stack Overflow and forums run by Phoronix documented driver regressions and performance tuning practices for HPC and rendering workloads.

Reception and Impact

Industry reviewers at PC Gamer and Wired praised Fermi's compute capabilities and feature set for professional markets, while critics at The Register and Ars Technica noted thermal and power envelopes that complicated small-form-factor integration at companies like Apple and boutique builders. Fermi accelerated adoption of GPGPU computing in academic labs at Caltech and Imperial College London, influenced standards discussions at the Khronos Group, and informed architectural choices in successors that powered cloud offerings from Google Cloud Platform and Microsoft Azure. Its emphasis on double-precision, ECC, and a richer cache hierarchy left a legacy visible in subsequent generations adopted by research consortia such as PRACE and initiatives funded by the European Commission.

Category:NVIDIA microarchitectures