LLMpediaThe first transparent, open encyclopedia generated by LLMs

CLEF

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: TREC Hop 4
Expansion Funnel Raw 224 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted224
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
CLEF
NameCLEF
Formation1997
TypeResearch initiative

CLEF

CLEF is an international research initiative focused on multilingual information retrieval, evaluation campaigns, and shared tasks. It brings together researchers from universities, companies, and libraries to develop and benchmark systems in cross-lingual search, multilingual text processing, and multimedia retrieval. Participants have included teams affiliated with institutions that appear across computer science, linguistics, and information science networks.

Overview

CLEF organizes evaluation campaigns that coordinate tasks, datasets, and metrics for multilingual retrieval challenges. Notable participants have included teams from University of Cambridge, Massachusetts Institute of Technology, Stanford University, University of Oxford, Carnegie Mellon University, University of Edinburgh, University of Tokyo, Tsinghua University, ETH Zurich, University of Toronto, University of California, Berkeley, University of California, Los Angeles, Princeton University, Columbia University, Yale University, Harvard University, Imperial College London, University of Manchester, University of Pennsylvania, University of Michigan, Peking University, University of Illinois Urbana-Champaign, University of Sydney, Australian National University, National University of Singapore, Nanyang Technological University, University of Waterloo, McGill University, University of British Columbia, University of Copenhagen, University of Amsterdam, Vrije Universiteit Amsterdam, KU Leuven, Universität Freiburg, Max Planck Society, Fraunhofer Society, INRIA, CERN, European Commission, Google, Microsoft, Facebook, Amazon (company), IBM, Baidu, Alibaba Group, Yahoo!, Apple Inc., Intel, NVIDIA, SAP (software), Siemens, Hitachi, NEC Corporation, Ricoh, Universität Zürich, Tokyo Institute of Technology, Seoul National University, KAIST, Indian Institute of Technology Bombay, Indian Institute of Science, Universidad de Buenos Aires, Universidade de São Paulo, Universidad Nacional Autónoma de México, University of Cape Town, Tel Aviv University, Weizmann Institute of Science, King's College London, University College London.

History

Early campaigns built on prior evaluation work such as initiatives at TREC, NIST, IARPA, and projects associated with European Union research frameworks and the Joint Research Centre. CLEF evolved through workshops, conferences, and special sessions held alongside events like SIGIR, ACL, EMNLP, ECIR, CIKM, IJCAI, AAAI, NeurIPS, ICML, WWW, KDD, WSDM, COLING, LREC, NAACL, EACL, ACL Anthology, and collaborations with organizations such as ACM, IEEE, Association for Computational Linguistics, European Language Resources Association, and ELRA. Major milestones included integration of cross-lingual tasks influenced by research at Microsoft Research, Google Research, Facebook AI Research, and adaptation to advances from projects like OntoNotes, WordNet, FrameNet, BabelNet, Wikipedia, Common Crawl, Europarl, OpenSubtitles, Project Gutenberg, and standards from ISO committees.

Structure and Components

CLEF campaigns comprise task definitions, test collections, query sets, relevance judgments, and evaluation metrics with coordination among program committees, task organizers, data providers, and sponsors. Working groups have featured members from Bibliothèque nationale de France, British Library, Library of Congress, Deutsche Nationalbibliothek, National Diet Library, National Library of China, National Library of Spain, National Library of Australia, Wellcome Trust, European Research Council, Horizon 2020, and infrastructure partners such as Zenodo, GitHub, OpenAIRE, ELIXIR, Dataverse, and Zenodo. Data modalities addressed include text, image, audio, and video drawn from corpora like TRECVID, ImageNet, COCO (dataset), Flickr, YouTube, LibriSpeech, Common Voice, EuroParl Corpus, and multilingual resources from UN Educational, Scientific and Cultural Organization, World Health Organization, and International Monetary Fund.

Applications and Use Cases

Outputs have informed development of search engines, digital libraries, machine translation, question answering, and recommendation systems used by organizations including Wikimedia Foundation, Elsevier, Springer Nature, Thomson Reuters, LexisNexis, ProQuest, Zotero, EndNote, EBSCO Information Services, JSTOR, Scopus, Clarivate, arXiv, bioRxiv, medRxiv, PubMed Central, ClinicalTrials.gov, European Medicines Agency, World Bank, International Labour Organization, United Nations, UNICEF, Red Cross, Amnesty International, Greenpeace, World Wide Fund for Nature, NASA, European Space Agency, NOAA, US Geological Survey, Smithsonian Institution, Metropolitan Museum of Art, British Museum.

Evaluation and Performance

Evaluation protocols have used measures like precision, recall, mean average precision, normalized discounted cumulative gain, and task-specific metrics with statistical analyses drawing on methods from R Project, Python (programming language), Scikit-learn, TensorFlow, PyTorch, Keras, SpaCy, NLTK, Hugging Face, Allen Institute for AI, Stanford NLP Group, Google Brain, DeepMind, OpenAI, Papers with Code, arXiv, and benchmark suites that mirror challenges in datasets such as GLUE, SuperGLUE, SQuAD, MS MARCO, XNLI, WMT, BUCC, CLUE, Hugging Face Datasets. Evaluation findings have influenced industrial deployments at Google Search, Bing, DuckDuckGo, Yandex, Baidu Search, Sogou, and enterprise search in Elastic (company), Apache Lucene, Solr, Algolia, Swiftype.

CLEF interfaces with initiatives and standards including TREC, NIST OpenMT, WMT, LDC (Linguistic Data Consortium), ELRA, ISO/TC 37, ISO/IEC JTC 1, W3C, Dublin Core, Schema.org, RDF, OWL, SPARQL, Linked Open Data, FAIR principles, OpenAIRE, Creative Commons, Open Data Commons, and projects like ODIE, CORD-19, CORD-19 Research Challenge, AI2 Science Questions, Text Retrieval Conference.

Category:Information retrieval