NeurIPS Reproducibility Challenge

NeurIPS Reproducibility Challenge
Name	NeurIPS Reproducibility Challenge
Established	2018
Discipline	Machine learning
Venue	NeurIPS

Contents

Overview
History and Organization
Submission and Evaluation Process
Results and Impact
Challenges and Criticisms
Notable Reproductions and Case Studies

NeurIPS Reproducibility Challenge

The NeurIPS Reproducibility Challenge is an initiative associated with the Conference on Neural Information Processing Systems that organizes community-led efforts to reproduce results from published machine learning papers. It coordinates volunteers, student participants, and senior researchers to verify experimental claims and to archive replication artifacts, engaging stakeholders from major institutions and conferences.

Overview

The Challenge mobilizes contributors from institutions such as Stanford University, Massachusetts Institute of Technology, University of California, Berkeley, Carnegie Mellon University, University of Oxford, University of Cambridge, Harvard University, Princeton University, California Institute of Technology, ETH Zurich and Tsinghua University alongside industry labs like Google Research, Microsoft Research, Facebook AI Research, DeepMind, OpenAI, Amazon Web Services, IBM Research, Intel Labs, NVIDIA, Huawei, Baidu Research and Tencent AI Lab. It interfaces with venues and organizations including NeurIPS, ICML, ICLR, ACL (conference), AAAI Conference on Artificial Intelligence, KDD, CVPR, ECCV, ICPR, SIGGRAPH, Workshop on Reproducibility in Machine Learning and arXiv. The effort typically produces artifacts stored on platforms such as GitHub, Zenodo, OSF, Code Ocean, Figshare and shared via community channels like Slack (software), Discord (software), Twitter, Reddit and LinkedIn.

History and Organization

The Challenge originated in response to reproducibility concerns highlighted by reports and initiatives from groups including Nature (journal), Science (journal), AAAS, National Science Foundation, European Research Council, Wellcome Trust, Bill & Melinda Gates Foundation, OpenAI, Mozilla Foundation and The Turing Institute. Early organizational leadership involved academics affiliated with University of Toronto, McGill University, University College London, Rice University, Yale University, Columbia University, Cornell University, Brown University, University of Washington and Johns Hopkins University. Coordination models drew on precedents from projects at ReScience C, Open Science Framework, Replication Crisis Project, Center for Open Science and initiatives by International Committee of Medical Journal Editors and Public Library of Science.

Organizational structures often mirror governance seen at Association for Computing Machinery, Institute of Electrical and Electronics Engineers, Royal Society, American Mathematical Society and Society for Industrial and Applied Mathematics, incorporating student organizers, faculty mentors, and program chairs, and collaborating with editorial boards from venues such as Journal of Machine Learning Research, Transactions on Machine Learning Research and Communications of the ACM.

Submission and Evaluation Process

Participants select target papers published at venues like NeurIPS, ICML, ICLR, CVPR, ACL (conference), KDD or posted on arXiv and attempt to replicate experiments. Submissions include code repositories on platforms such as GitHub, documentation deposited on Zenodo or Figshare, and reproducibility reports submitted to workshops affiliated with NeurIPS, ICLR Workshops or specialized tracks in JMLR or Transactions on Machine Learning Research.

Evaluations are performed by organizers, peer reviewers, and external mentors drawn from Google Research, DeepMind, Facebook AI Research, Microsoft Research and academic groups at Stanford University, MIT, UC Berkeley, Carnegie Mellon University and ETH Zurich. The process often uses continuous integration systems like Travis CI, GitHub Actions and CircleCI for automated checks, and containerization tools such as Docker and Kubernetes to standardize environments. Assessment criteria reference reproducibility frameworks from Center for Open Science, ReproZip, Binder, Code Ocean and editorial policies from Nature (journal) and Science (journal).

Results and Impact

Outcomes have included verified reproductions, partial reproductions, and documented failures, influencing authors at institutions such as Stanford University, University of Toronto, University of Washington and ETH Zurich to update code and errata. The Challenge has informed policy discussions at conferences like NeurIPS, ICML and ICLR and influenced repositories maintained by arXiv moderators and organizations such as ACL Anthology and Papers with Code. It has contributed to the development of best-practice guidelines used by journals such as Nature (journal), Science (journal), PLoS ONE and JMLR and funding agencies including National Science Foundation, European Research Council and UK Research and Innovation.

The project has helped train students and reproducibility advocates from programs at Stanford University, UC Berkeley, Carnegie Mellon University, Oxford University, Cambridge University and ETH Zurich, and fostered collaborations with industry teams at Google Research, DeepMind, OpenAI and Microsoft Research.

Challenges and Criticisms

Critiques involve resource constraints noted by contributors from Stanford University, MIT, UC Berkeley and Harvard University and methodological debates involving groups at Carnegie Mellon University, University College London and ETH Zurich. Limitations include hardware access disparities highlighted by organizations such as NVIDIA, Intel, Amazon Web Services and Google Cloud Platform and concerns about incentives raised by panels at NeurIPS and ICML. Other criticisms reference reproducibility discussions in reports by Nature (journal), Science (journal), AAAS, Royal Society and Center for Open Science.

Further debate involves attribution and citation norms governed by Committee on Publication Ethics and formatting expectations from ACM and IEEE, as well as legal and licensing complexities involving contributors from Stanford University, MIT, Harvard University and corporate labs including Facebook AI Research and Google Research.

Notable Reproductions and Case Studies

Representative reproductions have targeted widely cited works from authors affiliated with Stanford University, MIT, University of Toronto, Carnegie Mellon University, Google Research, DeepMind, OpenAI, Facebook AI Research and Microsoft Research. Case studies often examine landmark papers presented at NeurIPS and ICML and assess codebases hosted on GitHub with artifacts archived on Zenodo or Figshare. Educational case studies have been integrated into coursework at Stanford University, MIT, UC Berkeley, Carnegie Mellon University and University of Cambridge, and featured in panels with representatives from DeepMind, OpenAI, Google Research and Microsoft Research.

Category:Machine learning