GPT-3 — LLMpedia

GPT-3
Name	GPT-3
Developer	OpenAI
Release date	2020
Type	Autoregressive language model
Parameters	175 billion (reported)
Written in	Python, CUDA

Contents

Background
Architecture and Training
Capabilities and Applications
Limitations and Risks
Reception and Impact

GPT-3 is a large autoregressive language model developed by OpenAI that demonstrated dramatic advances in natural language generation, few-shot learning, and zero-shot transfer. It attracted broad attention across technology, media, academia, and public policy due to capabilities that intersected with work by institutions such as Microsoft, Stanford University, MIT, Harvard University, and Google. GPT-3 catalyzed discussion among corporations like Amazon (company), IBM, Facebook, and regulatory bodies including the Federal Trade Commission, European Commission, and UK Parliament.

Background

GPT-3 emerged from research traditions including projects at OpenAI, the lineage of models following work by teams connected to DeepMind, researchers associated with University of Toronto, and precedents set by architectures from Google Research and Facebook AI Research. Its announcement in 2020 followed high-profile releases and demonstrations that drew scrutiny from media outlets such as The New York Times, The Guardian, Wired (magazine), and The Verge. The model was positioned amid debates involving thinkers and organizations represented by Elon Musk, Sam Altman, Y Combinator, and academic voices from Oxford University, Cambridge University, and Carnegie Mellon University about societal effects and governance of advanced AI. Funding, partnerships, and commercialization discussions connected GPT-3 to entities like Microsoft Azure, venture firms in Silicon Valley, and standards conversations involving IEEE and United Nations forums.

Architecture and Training

GPT-3 used a transformer-based architecture rooted in the original transformer design by researchers at Google Research and leveraged techniques explored by groups at University of California, Berkeley and University of Washington. The reported scale—approximately 175 billion parameters—represented an order of magnitude increase relative to earlier models from teams at OpenAI and labs associated with Stanford University and Allen Institute for AI. Training relied on large text corpora assembled from sources similar to datasets used by projects at Common Crawl, collections curated by institutions such as Wikipedia, and published literature indexed in resources connected to arXiv and libraries like Library of Congress. Compute resources and optimization were comparable to efforts by NVIDIA, supercomputing centers at Lawrence Livermore National Laboratory, and cloud platforms run by Microsoft and Amazon Web Services. The approach reflected precedents in scaling laws discussed by researchers affiliated with Google DeepMind and statistical modeling methods taught at Massachusetts Institute of Technology.

Capabilities and Applications

GPT-3 demonstrated proficiency in tasks showcased in demonstrations by startups and labs associated with Y Combinator, OpenAI Startup Fund, and corporate partners such as Microsoft. Applications spanned content generation for organizations like BuzzFeed, drafting and summarization tasks used by legal teams at firms interacting with institutions like New York Bar Association, coding assistance relevant to developers using GitHub Copilot (a project involving GitHub and Microsoft), conversational agents deployed in products by companies including Salesforce and Zendesk, and creative writing projects in collaboration with authors represented by publishers like Penguin Random House and HarperCollins. Research groups at Stanford University, Harvard University, and MIT explored GPT-3 for scientific summarization tied to archives such as PubMed and arXiv, while entrepreneurs in ecosystems like Silicon Valley and cities such as San Francisco, New York City, and London experimented with business models around automated drafting, translation, and tutoring integrated with platforms like Stripe and Slack.

Limitations and Risks

Researchers from Allen Institute for AI, Carnegie Mellon University, and University of Oxford highlighted limitations including hallucination, sensitivity to prompt phrasing, and brittle factuality when compared to curated knowledge bases such as Wikidata and curated systems from IBM Watson. Risks raised by policy scholars at Harvard Kennedy School, Brookings Institution, and Chatham House included misinformation, bias mirroring datasets associated with media outlets like BBC and CNN, intellectual property concerns involving publishers like Elsevier and Springer Nature, and economic displacement discussed in reports by OECD and World Economic Forum. Security researchers from labs at MIT and ETH Zurich demonstrated adversarial prompting risks, while ethicists linked to University of Cambridge and Princeton University debated governance frameworks and proposals involving the European Union and national regulators such as the US Congress.

Reception and Impact

GPT-3 provoked broad reaction across technology press, academia, and policy communities. Coverage by outlets including The New Yorker, Financial Times, and Bloomberg traced influence on startups incubated at Y Combinator and on product integrations by Microsoft and GitHub. Scholarly citations and follow-on research from groups at Stanford University, DeepMind, OpenAI, and Facebook AI Research advanced language modeling, inspiring subsequent architectures and prompting competition from corporations like Google and NVIDIA. Debates in forums such as UNESCO, World Economic Forum, and national science agencies influenced discussions about oversight, safety, and access, connecting GPT-3’s legacy to initiatives led by OECD, IEEE, and multilateral dialogues involving G7 and G20.

Category:Artificial intelligence