Generated by GPT-5-mini| CuPy | |
|---|---|
| Name | CuPy |
| Developed by | Preferred Networks |
| Initial release | 2017 |
| Programming language | Python, C++ |
| Operating system | Linux, Windows, macOS |
| License | MIT |
CuPy
CuPy is a Python library for GPU-accelerated array computing that mirrors the NumPy API to enable high-performance numerical computing on NVIDIA GPUs. It provides an array object and a collection of routines for linear algebra, Fourier transforms, random number generation, and sparse matrices, facilitating workloads in machine learning, scientific computing, and data analysis. CuPy interoperates with CUDA toolkits and complements frameworks used in research and industry.
CuPy implements an ndarray-like interface similar to NumPy, making it accessible to users of SciPy, Pandas, scikit-learn, TensorFlow, and PyTorch. The project targets acceleration on NVIDIA Corporation GPUs via CUDA (software), leveraging libraries such as cuBLAS, cuFFT, and cuSPARSE. CuPy's design permits integration with ecosystems around Dask (software), Apache Arrow, and ONNX to support distributed and production workflows used by organizations like Google, Amazon, Microsoft, Facebook, and OpenAI. Researchers at institutions including MIT, Stanford University, Harvard University, University of California, Berkeley, and Princeton University have used CuPy in computational experiments.
CuPy originated at Preferred Networks to provide a NumPy-compatible GPU array library for deep learning research and was first released in 2017. Development has involved contributors from corporate research labs at NVIDIA, cloud providers such as Google Cloud Platform and Amazon Web Services, and academic groups at University of Tokyo and Kyoto University. Over successive releases CuPy adopted interoperability standards from projects like Numba, Cython, and CUDA Toolkit, and evolved in parallel with releases of CUDA and NVIDIA CUDA-X libraries. Major milestones include adding sparse support inspired by SciPy.sparse, random-number primitives consistent with Random123 and cuRAND, and extended kernel injection features influenced by Thrust (library) and CUB (library).
CuPy's core architecture centers on a GPU-backed ndarray implemented in C++ and Python bindings, interfacing with the CUDA Driver API and the CUDA Runtime API. Memory management uses a pool allocator comparable to strategies in TensorFlow and MXNet to reduce fragmentation and allocation overhead, while stream and event handling follow conventions from CUDA Streams and CUDA Events. The project exposes JIT compilation for device kernels via a runtime similar to NVRTC and integrates BLAS and LAPACK functionality through cuBLAS and cuSOLVER. CuPy also supports interoperability layers for DLPack and Numba to share buffers with PyTorch, TensorFlow, and JAX without copies.
CuPy offers array creation, indexing, broadcasting, and ufuncs compatible with NumPy standards, plus GPU-specific features like asynchronous execution controlled by CUDA Streams and memory pooling aligned with CUPTI performance tools. It includes linear algebra routines mapping to cuBLAS and cuSOLVER, FFTs via cuFFT, sparse linear algebra through cuSPARSE, and random number generation backed by cuRAND. Additional functionality comprises just-in-time kernel compilation, raw CUDA kernel submission, interoperability with Dask.distributed for parallel arrays, and I/O helpers that complement formats used by HDF5, Parquet, and Apache Arrow.
CuPy's performance advantages stem from offloading compute-bound operations to NVIDIA Tesla and NVIDIA A100 GPUs with software stacks from CUDA Toolkit and NVIDIA Driver. Benchmarks comparing CuPy with NumPy, PyTorch, and TensorFlow typically show orders-of-magnitude speedups for matrix multiplication and FFT on large inputs, similar to results reported using hardware from Intel Corporation CPU baselines and accelerators from NVIDIA. Performance tuning often relies on vendor tools like Nsight Systems and Nsight Compute, and comparisons reference libraries such as MKL, OpenBLAS, and Eigen (library) to contextualize CPU vs GPU trade-offs.
CuPy integrates with machine learning frameworks and scientific stacks including PyTorch, TensorFlow, MXNet, JAX, scikit-learn, SciPy, Pandas, Dask, and ONNX Runtime. It interoperates with data formats and platforms such as HDF5, Parquet, Apache Arrow, Kubernetes, and cloud services like Google Cloud Platform, Amazon Web Services, and Microsoft Azure. Tooling support includes profilers and debuggers like Nsight Systems, gdb, and Valgrind, while packaging and distribution leverage Conda (package manager), pip, and build systems informed by CMake and Bazel.
CuPy is used in domains requiring accelerated numerics: deep learning research in labs at DeepMind and OpenAI; computational physics at CERN; bioinformatics pipelines at Broad Institute; imaging and signal processing in groups at Johns Hopkins University; and finance quant teams at firms such as Goldman Sachs and Jane Street. Use cases include large-scale linear algebra, FFT-based simulations, stochastic simulations using GPU RNG, sparse solvers for graph analytics, and accelerating preprocessing in data engineering stacks built on Apache Spark and Dask.distributed.
Category:Python (programming language) libraries