Generated by GPT-5-mini| COCO (dataset) | |
|---|---|
| Name | COCO |
| Full name | Common Objects in Context |
| Created | 2014 |
| Creators | Microsoft Research, University of Illinois Urbana–Champaign, University of Oxford |
| Domain | Computer vision, Machine learning |
| License | Various (see original release) |
COCO (dataset) is a large-scale image dataset designed for object detection, segmentation, and captioning tasks, introduced by researchers at Microsoft Research with collaborators from University of Illinois Urbana–Champaign and University of Oxford. The dataset has influenced benchmarks used by teams at Google, Facebook AI Research, Stanford University, and other labs competing in events such as the ImageNet Challenge and evaluations at conferences like CVPR and NeurIPS. COCO emphasizes everyday scenes and contextual relationships, aiming to advance algorithms developed for deployments by companies such as Amazon and Apple and research groups at institutions like MIT and Berkeley.
COCO was released to address limitations in earlier resources including ImageNet, PASCAL VOC, and the Caltech 101 collection, providing high-quality instance-level annotations to support tasks pursued by teams at DeepMind, OpenAI, and academic groups at Carnegie Mellon University and University of Toronto. The dataset's creation involved contributors from academic labs funded by agencies such as the National Science Foundation and partnerships with industry research groups at Microsoft and Google Brain. COCO's design reflects evaluation practices promoted at venues like ICCV and tools used across projects at NVIDIA and Intel.
COCO contains images depicting everyday scenes sourced from platforms and collections similar in provenance to datasets used by Flickr users and curated in pipelines like those at Wikimedia Commons; the image corpus supports instance segmentation, keypoint detection, and captioning tasks explored by groups at Cornell University and Princeton University. The labeled object categories overlap with taxonomies studied in projects by Oxford Visual Geometry Group and collections from efforts like SUN database; categories include people, animals, and vehicles frequently annotated in datasets produced by Toyota Research Institute and Waymo. Splits such as training, validation, and testing follow conventions used in evaluations at ECCV and mirror partitioning strategies from the KITTI benchmark and the Cityscapes dataset.
Annotation protocols in COCO adopted crowdsourcing and quality-assurance methods used by platforms like Amazon Mechanical Turk and organizational workflows from labs at Microsoft Research and University of Washington, combining polygonal instance masks, bounding boxes, and keypoint landmarks akin to practices developed in projects by Google Research and Facebook AI Research. Labelers followed detailed instructions referencing object definitions similar to taxonomies devised at Stanford Vision Lab and verification steps comparable to pipelines at Adobe Research; inter-annotator agreement and consensus procedures reflect standards promoted in studies from Yale University and Columbia University. The captioning annotations were collected under protocols resembling those used in benchmark tasks at ACL and integrated quality measures informed by evaluation work at NICTA and ANU.
COCO popularized evaluation metrics such as mean Average Precision (mAP) computed over multiple Intersection over Union thresholds, an approach that echoes precision-recall analyses used in competitions like PASCAL VOC Challenge and performance reporting at ImageNet Challenge. Benchmarks include tasks for object detection, instance segmentation, and keypoint estimation that drive leaderboards maintained by academic consortia and industry teams at Facebook AI, Google Research, and Microsoft; performance is often compared with models like Faster R-CNN, Mask R-CNN, and architectures inspired by work at University of Oxford and ETH Zurich. Challenge tracks at conferences such as CVPR and ICLR have used COCO metrics to rank submissions from research groups at University of California, Berkeley and Tsinghua University.
COCO has been instrumental in advancing methods deployed in robotics research at Carnegie Mellon University and perception stacks at companies including Waymo and Cruise, influencing systems from autonomous vehicles to assistive devices developed at Boston Dynamics and research labs at Johns Hopkins University. The dataset has catalyzed a surge in segmentation and detection architectures used in startups and platforms at Adobe and Pinterest and informed captioning and visual-question-answering models pursued by teams at Facebook AI Research and Google DeepMind. COCO's role in education and pedagogy is seen in coursework at institutions like MIT, Stanford University, and ETH Zurich, and in reproducible benchmarks used in theses and publications presented at NeurIPS and ECCV.
A rich ecosystem of tools and extensions supports COCO-style tasks, including evaluation toolkits developed by contributors at Microsoft Research and wrappers integrated into frameworks from PyTorch, TensorFlow, and libraries maintained by NVIDIA. Derived datasets and extensions—such as augmented caption corpora, synthetic variants produced by labs at Adobe Research, and domain-specific subsets curated by groups at University of Washington and UCLA—have been released to support transfer learning studies appearing in publications at ICML and AAAI. Open-source repositories and challenge servers hosted by organizations like Papers With Code and community toolchains from Hugging Face facilitate reproducible experiments and leaderboards used by researchers at NYU and Imperial College London.
Category:Datasets for computer vision