Generated by GPT-5-mini| TRECVID | |
|---|---|
| Name | TREC Video Retrieval Evaluation |
| Formation | 2001 |
| Purpose | Video retrieval research evaluation |
| Headquarters | NIST |
| Region served | International |
| Parent organization | National Institute of Standards and Technology |
TRECVID TRECVID is an annual evaluation campaign for video retrieval research that provides standardized tasks, datasets, and metrics to advance automatic analysis of audio-visual content. Founded and coordinated by the National Institute of Standards and Technology in partnership with research groups at institutions such as Columbia University, University of Amsterdam, Carnegie Mellon University, University of Oxford, and INRIA, the program brings together industry laboratories and academic teams to benchmark techniques across retrieval, detection, and understanding. Participants have included groups from Google Research, Microsoft Research, IBM Research, Facebook AI Research, Amazon Web Services, Stanford University, Massachusetts Institute of Technology, Princeton University, University of California, Berkeley, University of Southern California, University of Maryland, University of Toronto, University of Cambridge, University of Edinburgh, ETH Zurich, Imperial College London, Peking University, Tsinghua University, Beihang University, KAIST, Seoul National University, The University of Tokyo, Osaka University, Australian National University, CSIRO, University of Melbourne, University of Sydney, McGill University, University of British Columbia, University of Waterloo, University of Michigan, Northwestern University, Cornell University, Yale University, Duke University, University of Pennsylvania, New York University, Columbia Engineering, Google DeepMind, OpenAI, Alibaba', Baidu Research, SenseTime, Megvii, Tencent AI Lab, Huawei Noah's Ark Lab.
TRECVID establishes repeatable evaluation protocols, curated corpora, and objective scoring to compare systems on video search, copy detection, event detection, and semantic indexing. The campaign operates at the intersection of work by groups such as ImageNet, YouTube, Vatican Library, British Broadcasting Corporation, Reuters, Associated Press, CNN, BBC, NHK, Nexra, Getty Images, National Geospatial-Intelligence Agency, European Space Agency, NASA, European Organization for Nuclear Research, Los Alamos National Laboratory, Sandia National Laboratories, Lawrence Livermore National Laboratory, SRI International, Siemens, Siemens Research, Panasonic, Sony Corporation, Hitachi, Toshiba Research.
Launched in 2001 as part of initiatives by the National Institute of Standards and Technology and influenced by earlier evaluation campaigns such as TREC, NIST Speech Recognition Evaluations, and MUC (Message Understanding Conference), the program has evolved through collaborations with organizations including MPEG, IEEE, ACM, CVPR, ICCV, ECCV, ICASSP, ACL, EMNLP, SIGIR, AAAI, IJCAI, NeurIPS, ICLR, KDD, ACL Anthology, LREC, KDD Cup, and ImageCLEF. Coordinating committees have drawn experts from Johns Hopkins University, SRI International, MITRE Corporation, NIST, Cranfield University, Rensselaer Polytechnic Institute, Rutherford Appleton Laboratory, Max Planck Institute for Informatics, Fraunhofer Society, Delft University of Technology, Ghent University, KU Leuven, University of Groningen, and University of Amsterdam. Governance relies on task organizers, data custodians, and external assessors to ensure reproducibility and fairness.
Core tasks have included semantic indexing, known-item search, ad hoc video search, instance search, video summarization, video copy detection, surveillance event detection, and multimedia hyperlinking. Evaluation metrics span mean Average Precision (mAP), precision at K, normalized Discounted Cumulative Gain (nDCG), average precision, Equal Error Rate (EER), F1 score, Intersection over Union (IoU) for temporal localization, and Durational Overlap. These metrics are standard across communities represented by conferences and organizations such as CVPR, ICCV, ECCV, NeurIPS, ICML, SIGIR, ICASSP, AAAI, IJCAI, ACM Multimedia, IEEE Transactions on Pattern Analysis and Machine Intelligence, and IEEE Signal Processing Magazine. Task definitions and scoring protocols have been influenced by benchmarks like PASCAL VOC, COCO, ImageNet Large Scale Visual Recognition Challenge, MSCOCO, YouTube-8M, ActivityNet, AVA Dataset, Kinetics, Charades, UCF101, HMDB51.
TRECVID provides curated test collections drawn from broadcast news, surveillance feeds, web videos, and user-generated content, with annotations for shot boundaries, semantic labels, temporal segments, and instance identities. Data sources and related datasets frequently used by participants include BBC Asian Network Archives, IREX, TREC Spoken Document Retrieval (SDR), Broadcast News Corpus, TDT (Topic Detection and Tracking), LDC, ELRA, Open Images, YouTube-8M, AVDB, TRECVid MED, MOMENTS in Time, Charades dataset, AVA, Kinetics-700, ActivityNet, VIRAT Video Dataset, CUAVE, CLEF, ImageNet Video, IJB-A, FaceNet evaluations and many institutional repositories. Annotation tools and reproducible scripts are contributed by teams from GitHub, Bitbucket, Zenodo, Figshare, MPI-SWS, OpenCV, FFmpeg, GStreamer, Kaldi Speech Recognition Toolkit, HTK, and Caffe.
Participation spans universities, corporate labs, government research centers, and startups, fostering cross-pollination among groups such as Google Research, Microsoft Research, IBM Research, Facebook AI Research, DeepMind, OpenAI, Amazon Web Services, Baidu Research, SenseTime, Tencent AI Lab, Alibaba DAMO Academy, ZTE, NTT Data, Hitachi, Panasonic R&D, Sony Research, Samsung Research, LG Electronics Research, Qualcomm Research, Intel Labs, and NVIDIA Research. Impact is evidenced by adoption of TRECVID protocols in academic publications at CVPR, ICCV, ECCV, NeurIPS, ICML, SIGIR, AAAI, and industrial product features in platforms by YouTube, Vimeo, Netflix, Hulu, Roku, Spotify, Apple Inc., Google Play, as well as contributions to standards bodies like MPEG and ISO. TRECVID has influenced evaluation culture in initiatives like ImageCLEF, VATIC, VOT Challenge, DAVIS Challenge, DETRAC, PETs, i-LIDS.
Over two decades, top-performing systems have moved from handcrafted features and bag-of-words models toward deep convolutional and transformer architectures incorporating multimodal fusion. Landmark participant contributions include approaches from Oxford Visual Geometry Group, DeepMind, Google Brain, Facebook AI Research, Microsoft Research Asia, Stanford Vision Lab, MIT CSAIL, CMU Vision Group, Berkeley AI Research, ETH Zurich Computer Vision Lab, Inria Paris, Singapore-MIT Alliance for Research and Technology, Tsinghua Computer Vision Group, Peking University Visual Computing Center, and industrial teams at IBM Watson, Amazon Rekognition, and Clarifai. Benchmarks such as mAP and nDCG have steadily increased, while tasks like temporal localization and instance search remain challenging; advances reported at CVPR, ICCV, ECCV, NeurIPS, ACL, ICASSP reflect integration of architectures like ResNet, VGG, EfficientNet, Inception, YOLO, SSD, Faster R-CNN, Mask R-CNN, Transformer (machine learning), BERT, Vision Transformer, Swin Transformer, DistilBERT, and training regimes using datasets such as ImageNet, MSCOCO, Kinetics, YouTube-8M. Evaluation outcomes have guided real-world deployments in media monitoring, content moderation, digital archives, and forensic analysis by organizations like Interpol, FBI, Europol, BBC Archives, Library of Congress, National Archives and Records Administration, and UNESCO.
Category:Benchmarking initiatives