ADE20K — LLMpedia

ADE20K
Name	ADE20K
Type	Image dataset
Domain	Computer vision
Released	2017
Creators	MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)
Size	~20k images, ~150 classes (scene), ~2k object categories (part)
License	Creative Commons variants

Contents

Overview
Dataset Composition
Annotation Protocols
Benchmark Tasks and Metrics
Baseline Models and Performance
Licensing and Usage
Impact and Applications

ADE20K

ADE20K is a densely annotated image dataset widely used for scene parsing and semantic segmentation benchmarking in computer vision research. Developed at Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (CSAIL) and introduced alongside work from the MIT CSAIL Computer Vision Group, it provides pixel-level annotations for diverse scenes and object parts and has been integrated into challenges and toolkits maintained by academic and industrial groups such as IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops and the ImageNet community. The dataset underpins evaluation in segmentation benchmarks and influences model design in institutions including Facebook AI Research, Google Research, and Microsoft Research.

Overview

ADE20K was released to address limitations in prior datasets like PASCAL VOC and MS COCO by offering exhaustive object and part annotations across indoor and outdoor scenes. The collection spans iconic locations and settings referenced in datasets created by groups at Stanford University and University of California, Berkeley, paralleling efforts exemplified by SUN Database and Places Database. It has been used in competitions organized with partners such as CVPR and integrated into frameworks from PyTorch and TensorFlow ecosystems, influencing architectures from U-Net derivatives to DeepLab families.

Dataset Composition

The dataset contains approximately 20,000 images drawn from varied sources, annotated for scenes, objects, and object parts with hierarchical labels. Categories include hundreds of object types comparable to taxonomies used in ImageNet Large Scale Visual Recognition Challenge and structural scene labels similar to SUN Database hierarchies. Images cover environments linked to notable datasets curated by researchers at University of Oxford and Carnegie Mellon University, reflecting urban, rural, and interior domains. Annotation statistics are commonly reported alongside model evaluations at venues such as NeurIPS and ICCV.

Annotation Protocols

Annotation followed a protocol emphasizing exhaustive, instance-aware, pixel-level labeling performed by trained annotators and guided by taxonomy curation from the MIT team. Labeling practices mirror those used in projects led by groups at UC Berkeley for Berkeley Segmentation Dataset and draw on methodologies discussed in papers from European Conference on Computer Vision (ECCV). Quality control incorporated cross-checks akin to procedures in Amazon Mechanical Turk mediated studies and expert reviews similar to annotation audits at W3C-affiliated efforts for media metadata. A hierarchical vocabulary captures object-part relations modeled after ontologies used in research from Oxford Visual Geometry Group.

Benchmark Tasks and Metrics

ADE20K supports tasks including semantic segmentation, instance segmentation, panoptic segmentation, and part segmentation; these tasks have been benchmarked using metrics such as mean Intersection over Union (mIoU), pixel accuracy, and average precision (AP) for instance-level evaluation. Leaderboards for these tasks are maintained in conjunction with conferences like CVPR and ECCV workshops, and metrics are compared to baselines reported in publications from Google DeepMind, Facebook AI Research, and research groups at ETH Zurich. Standard evaluation scripts used in ADE20K experiments are compatible with evaluation toolkits developed by contributors to COCO API and repositories hosted by contributors at GitHub.

Baseline Models and Performance

Baseline models evaluated on ADE20K include encoder-decoder networks such as U-Net, atrous convolutions embodied in DeepLabv3+, and multi-scale feature approaches exemplified by PSPNet. Reported top-performing methods often combine backbone networks like ResNet and attention mechanisms inspired by publications from Google Research and OpenAI. Performance trends reported at NeurIPS and ICLR show steady improvements in mIoU and AP as datasets and augmentations from groups at Carnegie Mellon University and ETH Zurich inform architecture refinements. Model zoos maintained by institutions including Facebook AI Research and Microsoft Research provide pretrained weights evaluated on ADE20K.

Licensing and Usage

ADE20K images and annotations are distributed under Creative Commons licenses with terms similar to those used by ImageNet and COCO projects, enabling academic and noncommercial research while requiring attribution consistent with policies observed by universities such as MIT and organizations like Creative Commons. Usage in benchmarks and publications follows data-use norms enforced by conference committees at CVPR and NeurIPS, and many codebases from contributors hosted on GitHub include scripts to load ADE20K in toolchains developed for PyTorch and TensorFlow.

Impact and Applications

ADE20K has influenced research in scene understanding, robotics, and augmented reality, informing perception modules in projects at MIT CSAIL, Stanford AI Lab, and industrial labs including Google Research and Amazon Web Services research teams. Applications span autonomous navigation prototypes evaluated in experiments affiliated with Carnegie Mellon University and semantic mapping work appearing in collaborations with NASA-related image analysis. The dataset has also enabled advances in transfer learning and domain adaptation showcased at ICLR and NeurIPS and has been cited in work on efficient architectures by groups at Facebook AI Research and Microsoft Research.

Category:Computer vision datasets