BFGS — LLMpedia

BFGS
Name	BFGS
Type	Quasi-Newton method
Inventors	Broyden, C. G., Fletcher, R., Goldfarb, D. S., Shanno, D. F.
First published	1970
Field	Numerical optimization
Typical applications	Nonlinear optimization, unconstrained minimization, machine learning, inverse problems

Contents

Introduction
Background and Motivation
Algorithm Description
Convergence and Theoretical Properties
Variants and Extensions
Practical Implementation and Applications
Numerical Performance and Benchmarking

BFGS

BFGS is a quasi-Newton optimization algorithm for unconstrained nonlinear minimization that builds an approximation to the inverse Hessian matrix iteratively to produce search directions. It is widely used in numerical analysis, computational science, and engineering for problems where second-derivative information from Isaac Newton-type methods is costly or impractical. The method was formulated by C. G. Broyden, C. G., R. Fletcher, R., D. S. Goldfarb, D. S. and D. F. Shanno, D. F. across independent contributions, and it remains a standard tool in libraries and software maintained by institutions such as Netlib, Numerical Recipes, and developers at AT&T Bell Laboratories and IBM Research.

Introduction

BFGS occupies a central role among quasi-Newton schemes alongside the Davidon–Fletcher–Powell family and limited-memory variants. It updates an approximate inverse Hessian using low-rank corrections derived from gradient and step differences, enabling superlinear convergence in many problems encountered by practitioners at Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, and industrial groups at Microsoft Research and Google Research. Implementations appear in packages developed by SciPy, MATLAB, R Project for Statistical Computing, and commercial solvers from Gurobi and MathWorks.

Background and Motivation

Motivation for BFGS traces to early work on optimization by John von Neumann, Richard Courant, and the development of numerical linear algebra at places like Harvard University and Princeton University. When full Hessian evaluation and factorization used in methods popularized by Andrey Kolmogorov or Carl Friedrich Gauss were too expensive, researchers sought iterative updates that preserved desirable curvature properties. The BFGS update inherited ideas from the secant equation and earlier updates such as those by William F. Broyden and the Davidon update, and it was motivated by applications in structural optimization at General Electric and parameter estimation in contexts like work at Los Alamos National Laboratory and Sandia National Laboratories.

Algorithm Description

At each iteration, the method computes a step by solving a linear system using the current inverse Hessian approximation and the negative gradient, similar in spirit to directions used in Leonhard Euler-inspired descent. After a line search (often Wolfe or strong Wolfe conditions associated with techniques from J. E. Dennis Jr. and R. B. Schnabel), BFGS updates the inverse Hessian approximation via a rank-two correction that enforces the secant condition. Typical line search strategies are implemented in libraries maintained by Netlib and research groups at Argonne National Laboratory and Los Alamos National Laboratory. Practical implementations often rely on robust linear algebra backends such as those produced by LAPACK and BLAS from Oak Ridge National Laboratory collaborators.

Convergence and Theoretical Properties

Theoretical analysis of BFGS builds on the foundations laid by Davidon, W. C., Fletcher, R., Goldfarb, D. S., and Shanno, D. F. and has been extended by researchers at Stanford University, Massachusetts Institute of Technology, and University of Cambridge. Under standard assumptions—twice continuously differentiable objective, bounded level sets, and exact line searches—BFGS exhibits superlinear convergence. Global convergence results under inexact line searches or nonconvex objectives were proven in studies by authors affiliated with Cornell University, University of California, Berkeley, and University of Oxford, employing techniques related to those used in the analysis of methods by Newton and Gauss–Newton.

Variants and Extensions

Important variants include the limited-memory BFGS (L-BFGS) introduced to handle very large-scale problems encountered at Google, Facebook, and Amazon Web Services; damped BFGS for improved stability in nonconvex settings studied by teams at IBM Research; and symmetric-rank-one (SR1) updates explored at ETH Zurich and University of Cambridge. Extensions integrate trust-region frameworks developed at INRIA and preconditioning strategies leveraging work from Princeton University and Columbia University. Hybrid schemes combine BFGS with conjugate gradient methods used in finite-element analyses at Siemens and Schlumberger.

Practical Implementation and Applications

BFGS is implemented in widespread software: optimization modules in SciPy and TensorFlow provide interfaces used by researchers from MIT, Stanford University, and Carnegie Mellon University. In engineering, it is applied to aerodynamic shape optimization in collaborations between NASA centers and industry partners like Boeing and Rolls-Royce. In computational chemistry and materials science, groups at Argonne National Laboratory and Oak Ridge National Laboratory employ BFGS for energy minimization in codes developed at Lawrence Livermore National Laboratory and research consortia involving University of California, San Diego.

Numerical Performance and Benchmarking

Benchmarking studies have been conducted by consortia including researchers from NIST, CERN, and national laboratories, comparing BFGS and its variants on standard sets such as those curated by CUTEst and test problems from Moré, Garbow, and Hillstrom. Results show BFGS often outperforms conjugate gradient and simple gradient methods on medium-scale smooth problems, while L-BFGS is favored for very high-dimensional tasks encountered in machine learning research at Google Research and DeepMind. Performance depends on line search robustness, preconditioning, and problem conditioning; profiling is routinely done with tools from Intel and numerical libraries from AMD and NVIDIA.

Category:Optimization algorithms