GPT-3 — LLMpedia

GPT-3
Name	GPT-3
Developer	OpenAI
Release	June 2020
Type	Large language model
License	Proprietary software

Contents

Overview
Development and release
Architecture and capabilities
Applications and impact
Ethical concerns and criticism

GPT-3. Generative Pre-trained Transformer 3 is a Large language model developed by OpenAI and released in June 2020. It represented a significant leap in Artificial intelligence capabilities due to its unprecedented scale of 175 billion parameters. The model demonstrated remarkable proficiency in generating human-like text across a vast array of tasks without task-specific training.

Overview

The model is the third iteration in the Generative pre-trained transformer series, building upon the foundations of GPT-2. Its training involved processing a massive corpus of text data from the Common Crawl, Wikipedia, and a multitude of books and websites. This extensive pre-training enabled Few-shot learning abilities, where it could perform new tasks based on just a few examples provided in a prompt. The release of the model was accompanied by a private API, which allowed developers and companies like Microsoft to integrate its capabilities into various products.

Development and release

The research and engineering effort behind the model was led by scientists at OpenAI, including Ilya Sutskever and Dario Amodei. Training was conducted on a supercomputing cluster built in partnership with Microsoft, utilizing thousands of GPUs over several months. Following its initial research preview, access was gradually expanded through the OpenAI API, with Microsoft later securing exclusive licensing rights for its underlying technology. The development was detailed in a seminal paper presented at the Conference on Neural Information Processing Systems.

Architecture and capabilities

Architecturally, it is a Transformer-based model, utilizing a decoder-only structure with attention mechanisms. Its 175 billion parameters, a dramatic increase from its predecessor, allowed for sophisticated pattern recognition across languages, code, and structured data. Capabilities included coherent long-form text generation, translation between languages like English and French, writing functional code in Python and JavaScript, and answering complex questions. Its performance on benchmarks like SuperGLUE and LAMBADA often approached human-level understanding.

Applications and impact

The model's API spurred a wave of innovation, leading to applications in creative writing aids, advanced chatbots, programming assistants like GitHub Copilot, and educational tools. It powered experimental projects from The Guardian to Alphabet's DeepMind, and influenced products across Silicon Valley. Its success catalyzed an industry-wide race toward larger models, directly influencing subsequent projects like Google's LaMDA and Anthropic's Claude. The technology also raised significant questions about the future of software development and content creation.

Ethical concerns and criticism

Critics, including researchers at the Allen Institute for AI and Partnership on AI, highlighted risks of generating misinformation, bias, and harmful content. Studies showed the model could reproduce toxic language and stereotypes present in its training data from sources like Reddit. Concerns about environmental costs due to massive computational resources were raised alongside warnings about job displacement. These debates influenced policy discussions at institutions like the European Union and informed the development of more aligned successors, including efforts by Anthropic and initiatives at Stanford University.

Category:Large language models Category:Artificial intelligence