Center for AI Safety

Center for AI Safety
Name	Center for AI Safety
Founded	2022
Type	Nonprofit
Focus	Artificial intelligence safety, AI alignment, risk mitigation
Methods	Research, advocacy, policy advising

Contents

Overview
History and founding
Mission and goals
Research and publications
Policy engagement and advocacy
Partnerships and collaborations
Criticism and controversies

Center for AI Safety The Center for AI Safety is a U.S.-based nonprofit organization focused on reducing catastrophic risks from advanced artificial intelligence. It engages in research, policy engagement, and public communication to influence development pathways and safety protocols associated with powerful machine learning systems. The organization interacts with academic institutions, industry labs, and governmental bodies to promote standards, norms, and technical safeguards.

Overview

The organization operates at the intersection of technical research and public policy, drawing from communities around OpenAI, DeepMind, Anthropic, Google Research, Microsoft Research, Stanford University, Massachusetts Institute of Technology, Carnegie Mellon University, University of California, Berkeley, University of Oxford, University of Cambridge, Harvard University, Princeton University, Yale University, Columbia University, New York University, University of Toronto, ETH Zurich, University of Washington, California Institute of Technology, Peking University, Tsinghua University, Sequoia Capital, Andreessen Horowitz, Google DeepMind Safety Team, OpenAI Safety Team, Center for Security and Emerging Technology, Montreal Institute for Learning Algorithms, Allen Institute for AI, Future of Humanity Institute, Machine Intelligence Research Institute, The Alan Turing Institute, National Institute of Standards and Technology, European Commission, United States Department of Defense, Office of the Director of National Intelligence, National Security Commission on Artificial Intelligence, and World Economic Forum. It publishes technical reports, policy briefs, and public statements aimed at mitigating risks from model scale, capability acceleration, and misuse.

History and founding

Founded in 2022, the institution emerged amid debates involving OpenAI, DeepMind, Anthropic, Google, Microsoft, Sam Altman, Demis Hassabis, Dario Amodei, Ilya Sutskever, Geoffrey Hinton, Yoshua Bengio, Yann LeCun, and other leaders who had voiced concern about advanced systems. Its founding drew attention from investors and philanthropists associated with Elon Musk, Peter Thiel, Reid Hoffman, Marc Benioff, Toby Ord, Stuart Russell, and Nick Bostrom. Early activity intersected with policy initiatives from United States Congress, European Parliament, UK Parliament, G7, G20, United Nations, Organisation for Economic Co-operation and Development, NATO, and regulatory proposals such as those discussed by European Commission task forces and National Institute of Standards and Technology.

Mission and goals

The center's stated mission aligns with priorities advanced by Future of Humanity Institute, Machine Intelligence Research Institute, Center for Security and Emerging Technology, The Brookings Institution, RAND Corporation, Berkman Klein Center for Internet & Society, OpenAI, DeepMind, and Anthropic: to reduce risks of systems causing large-scale harm through safety research, governance proposals, and industry norms. Goals include improving robustness of large-scale models developed by teams at Google Research, Microsoft Research, Meta AI, NVIDIA, Amazon Web Services, IBM Research, Apple Inc., Baidu Research, Huawei, SenseTime, ByteDance, and academic labs at Stanford University, Massachusetts Institute of Technology, University of Cambridge, and University of Oxford.

Research and publications

Publications from the organization cover technical topics comparable to work at Stanford Human-Centered AI Institute, MIT Computer Science and Artificial Intelligence Laboratory, UC Berkeley Artificial Intelligence Research (BAIR), DeepMind Safety Research, OpenAI Safety Team, and Anthropic: model evaluation, adversarial robustness, interpretability, alignment techniques, and verification for transformer architectures such as those used in GPT-4, PaLM, LLaMA, BERT, RoBERTa, T5, Megatron-LM, Chinchilla, Sparrow, and others. The center issues reports and open letters akin to those published by Future of Life Institute, Center for Strategic and International Studies, Council on Foreign Relations, Human Rights Watch, Amnesty International, and Electronic Frontier Foundation to influence debate about disclosure, red-teaming, and model release practices.

Policy engagement and advocacy

The organization engages with policymakers and regulatory bodies comparable to interactions by Electronic Privacy Information Center, Access Now, Tech Policy Lab, OpenAI, DeepMind, Anthropic, European Commission, United States Congress, UK Parliament, G7, G20, United Nations, World Health Organization, National Institute of Standards and Technology, Federal Trade Commission, Office of the Director of National Intelligence, and National Security Commission on Artificial Intelligence. Activities include testimony, briefing notes, and collaboration on standards with International Organization for Standardization, Institute of Electrical and Electronics Engineers, World Economic Forum, Organisation for Economic Co-operation and Development, and national laboratories and advisory committees.

Partnerships and collaborations

The center has collaborated with academic groups and industry labs including OpenAI, DeepMind, Anthropic, Google Research, Microsoft Research, Meta AI, Stanford University, Massachusetts Institute of Technology, University of Oxford, University of Cambridge, Carnegie Mellon University, Allen Institute for AI, Montreal Institute for Learning Algorithms, The Alan Turing Institute, Future of Humanity Institute, Center for Security and Emerging Technology, RAND Corporation, Brookings Institution, Human Rights Watch, and Electronic Frontier Foundation. Funding and project partnerships have involved technology firms and philanthropic organizations tied to figures such as Elon Musk, Sam Altman, Reid Hoffman, Peter Thiel, Founders Fund, Open Philanthropy Project, Good Ventures, and other donors active in AI governance.

Criticism and controversies

Critiques of the organization mirror debates in the broader AI safety community and have been voiced by stakeholders from OpenAI, DeepMind, Anthropic, Google, Meta, NVIDIA, Amazon, Apple Inc., Chinese Academy of Sciences, Tsinghua University, and civil society groups including Electronic Frontier Foundation, Human Rights Watch, Amnesty International, Access Now, and Transparency International. Controversies have centered on perceived alignment with industry funders, the transparency of collaborations with labs such as OpenAI and DeepMind, tensions with academic norms at Stanford University and MIT, and disputes over the framing of catastrophic risk versus nearer-term harms debated in venues like NeurIPS, International Conference on Machine Learning, AAAI Conference on Artificial Intelligence, ICLR, Association for Computing Machinery, Institute of Electrical and Electronics Engineers, European Parliament, and national legislatures. Critics include scholars and practitioners associated with Yann LeCun, Gary Marcus, Noam Chomsky, Geoffrey Hinton, Stuart Russell, Nick Bostrom, and others who have publicized differing views on risk prioritization and governance.

Category:Artificial intelligence safety organizations