ArviZ — LLMpedia

ArviZ
Name	ArviZ
Programming language	Python
Operating system	Cross-platform
License	BSD

Contents

Overview
Features
Architecture and Design
Usage and API
Development and Community
Applications and Case Studies
History and Releases

ArviZ is an open-source Python library for exploratory analysis of Bayesian models, diagnostics, and visualization. It provides tools for posterior analysis, prior and posterior predictive checks, convergence diagnostics, and model comparison, integrating with probabilistic programming frameworks and scientific computing ecosystems.

Overview

ArviZ interoperates with probabilistic programming systems such as PyMC, Stan, TensorFlow Probability, Edward, Pyro, NumPyro, JAX, Greta, Turing.jl, WinBUGS, OpenBUGS, JAGS, Soss, BeanMachine and scientific libraries like NumPy, SciPy, pandas, matplotlib, seaborn, bokeh, holoviews, plotly, Altair. It emphasizes reproducible workflows and integrates with environments including Jupyter Notebook, JupyterLab, Google Colab, Visual Studio Code, PyCharm, and Binder.

Features

ArviZ includes functionality for posterior summarization, diagnostics, and plotting with components aligned to standards used by Gelman–Rubin, R-hat, Effective sample size, LOO, WAIC, and techniques related to Bayesian model averaging. Visualization routines support traceplots, pairplots, energy plots, ridge plots, autocorrelation, posterior predictive checks, and forest plots compatible with Seaborn, matplotlib, bokeh, plotly, and Altair. Diagnostics and statistical summaries reference concepts used in work by Andrew Gelman, Donald Rubin, Bradley Efron, Herman Chernoff, David Spiegelhalter, and Aki Vehtari.

Architecture and Design

ArviZ uses an internal data structure, designed to interface with inference engines and data frames from libraries like xarray, pandas, and NumPy. The design pattern favors immutable data containers and functional transformations inspired by projects such as dask, xarray, and scikit-learn. Backends and adapters provide bridges to systems including PyMC, Stan, Pyro, and TensorFlow Probability, while plotting backends permit rendering via matplotlib and interactive backends like bokeh and plotly. The project follows contribution and governance models similar to NumPy, SciPy, and pandas to manage issues, pull requests, and continuous integration with services like GitHub, Travis CI, GitLab, and CircleCI.

Usage and API

Typical workflows show conversion of model outputs into an ArviZ dataset for use with functions that compute diagnostics such as R-hat, effective sample size, LOO, and WAIC, and for plotting routines that create trace, pair, and posterior predictive plots. API design reflects influences from xarray data handling and exposes functions and objects compatible with pandas Series and DataFrame usage. Integration examples often reference reproducible research tools like Jupyter Notebook and RStudio when combining ArviZ with languages such as R via interfaces like rpy2 or bridges to Stan through CmdStanPy and RStan.

Development and Community

Development occurs on platforms such as GitHub, with collaboration among contributors from academia and industry, including users affiliated with Columbia University, University of Oxford, Harvard University, Stanford University, Massachusetts Institute of Technology, Google, Uber, Microsoft, and research groups at institutions like Imperial College London and University College London. The community engages via channels such as Discourse, GitHub Issues, Slack, Gitter, and conference presentations at venues like PyCon, SciPy, NeurIPS, AISTATS, ISBA, JSM, and ICML. Documentation and tutorials are provided through ReadTheDocs-style sites and workshops at summer schools hosted by organizations such as The Carpentries.

Applications and Case Studies

ArviZ is used in applied research across fields represented by institutions like NASA, NOAA, World Health Organization, Centers for Disease Control and Prevention, Google DeepMind, and companies such as Facebook, Amazon, Airbnb, and Spotify. Case studies include hierarchical modeling in ecology with collaborators from University of Cambridge, epidemiological modeling with teams at Imperial College London and London School of Hygiene & Tropical Medicine, and neuroimaging analysis in projects affiliated with UCL Institute of Neurology and MIT McGovern Institute. It supports workflows in published work appearing in journals such as Journal of the Royal Statistical Society, Statistics and Computing, Nature Methods, Journal of Machine Learning Research, and conference proceedings from NeurIPS and ICML.

History and Releases

The project originated from efforts by contributors active in the Bayesian workflow community and developers connected to PyMC and Stan ecosystems, with influences from reproducible research advocates like Andrew Gelman and Aki Vehtari. Releases follow semantic versioning and are available via package managers like PyPI and channels such as conda-forge, with continuous integration and test suites modeled after practices used in NumPy and SciPy. Major releases have added support for backends including NumPyro and TensorFlow Probability and expanded plotting backends to interactive libraries such as plotly and bokeh.

Category:Bayesian statistics