nbformat — LLMpedia

nbformat
Name	nbformat
Title	nbformat
Developer	Jupyter Project contributors, Project Jupyter
Initial release	2014
Repository	GitHub
Programming language	Python (programming language)
License	BSD license
Website	Project Jupyter documentation

Contents

History
Purpose and scope
File format specification
API and usage
Versioning and compatibility
Implementation and integration
Security and validation

nbformat

nbformat is a Python library and data model for representing interactive computational documents used by Jupyter Notebook and related projects such as JupyterLab, IPython, Binder (service), Google Colaboratory, Kaggle, and nteract. It defines a structured JSON schema for notebook documents, enabling interchange among kernels like IPython (kernel), IRkernel, IJulia, and tools including Papermill, nbconvert, Voila, and BinderHub.

History

nbformat originated alongside the emergence of the IPython (project) notebook interface and the early IPython Notebook format, as contributors from Fernando Pérez's team and the growing Jupyter Project community worked to formalize a stable document model. Key milestones include the split of the IPython project into IPython (project) and Project Jupyter, the introduction of notebook format versions to address JSON (JavaScript Object Notation) schema evolution, and coordinated efforts on GitHub repositories and issue trackers hosted by NumFOCUS-backed projects. Influential collaborators have included members affiliated with MIT, UC Berkeley, Cal Poly, and corporate contributors from Microsoft, Google, IBM, and Anaconda, Inc..

Purpose and scope

The primary purpose of nbformat is to provide a canonical, versioned JSON representation for interactive documents, ensuring interoperability among diverse tools such as JupyterLab, nteract, Hydrogen (plugin), Polynote, and export utilities like nbconvert. The scope covers the schema for notebook metadata, cell types, outputs, execution counts, and attachments to support workflows used in research groups at institutions like Stanford University, Harvard University, ETH Zurich, and organizations including NASA, CERN, and Wikimedia Foundation. nbformat targets portability between execution environments—e.g., from Kaggle kernels to Google Colaboratory sessions—and compatibility with reproducibility tools such as Binder (service) and ReproZip.

File format specification

nbformat defines a JSON-based document with top-level fields (e.g., "nbformat", "nbformat_minor", "metadata", and "cells") following a prescribed schema that evolved across versions influenced by community proposals on GitHub. Cells are typed (code, markdown, raw) and include properties like "source", "outputs", and "execution_count"; outputs cover rich representations such as "text/plain", "image/png", "image/svg+xml", "application/json", and MIME bundles used by renderers including MathJax and PixieDust. The document model permits metadata blocks for kernelspecs referencing Kernel (computing)s and language info used by tools developed at Anaconda, Inc. and projects like Metaplotlib (example integrations). The format also supports attachments (embedded binary data), which are base64-encoded and intended for compatibility with editors including JupyterLab and viewers like nbviewer.

API and usage

nbformat exposes programmatic APIs in Python (programming language) for reading, writing, validating, and upgrading notebook documents. Typical usage patterns involve functions to parse JSON into in-memory node structures, manipulate cell lists, adjust metadata for kernelspecs, and serialize back to disk for consumers such as nbconvert or Voila. Developers at institutions such as UC Berkeley and companies like Microsoft frequently use nbformat APIs within CI pipelines, automated testing with pytest, and conversion workflows orchestrated by Airflow. CLI utilities and library functions allow integration into editors like Visual Studio Code and plugins for Atom (text editor).

Versioning and compatibility

nbformat maintains explicit major and minor versioning to handle schema changes while preserving backwards compatibility for stable consumption by clients including Jupyter Notebook and JupyterLab. The project documents upgrade rules and provides programmatic converters to migrate older "nbformat" numbers to current schema expectations, facilitating transitions similar to version coordination seen in Semantic Versioning-oriented ecosystems. Compatibility concerns drive collaboration between kernel authors (e.g., Julia Computing for IJulia) and tool vendors (e.g., Google, Microsoft) to ensure notebooks authored in one environment render correctly in another.

Implementation and integration

Implemented primarily in Python (programming language), nbformat integrates with testing, packaging, and distribution systems used by open-source projects hosted on GitHub and CI services like Travis CI and GitHub Actions. It is a dependency of higher-level systems such as nbconvert, papermill, JupyterHub, JupyterLab, and third-party platforms including Deepnote and Databricks. Language kernels such as IRkernel, IJulia, and xeus-python rely on nbformat-compatible payloads when emitting outputs, while viewers like nbviewer parse nbformat JSON to render notebooks in web browsers.

Security and validation

Security considerations for nbformat include validation against malicious or malformed JSON, safe handling of embedded attachments and image payloads, and awareness of executable content risks when notebooks contain active code cells. Validators within nbformat help detect schema violations; runtime mitigations are recommended by communities at Project Jupyter and organizations such as NumFOCUS and The Linux Foundation for deployment in multi-tenant services like BinderHub and JupyterHub. Best practices propagated by research groups at MIT and Harvard advise using isolated execution environments (containers from Docker (software) or virtual environments with conda) and scanning notebooks before executing in shared infrastructures.

Category:JSON-based file formats