Stable Diffusion — LLMpedia

Stable Diffusion
Name	Stable Diffusion
Developer	Stability AI, CompVis at Ludwig Maximilian University of Munich, Runway
Released	August 22, 2022
Programming language	Python
Operating system	Cross-platform
Genre	Deep learning, Generative artificial intelligence, Text-to-image model
License	CreativeML Open RAIL-M

Contents

Overview
Technical details
Capabilities and applications
Ethical and societal impact
Development and release history

Stable Diffusion is a latent diffusion model for generating detailed images from textual descriptions. Developed through a collaboration between Stability AI, the CompVis research group at Ludwig Maximilian University of Munich, and Runway, it was publicly released in August 2022. The model is notable for its open-source nature and its ability to run efficiently on consumer-grade hardware, significantly broadening access to advanced AI image synthesis.

Overview

The architecture represents a major evolution in Computer vision and Deep learning, building upon earlier models like DALL-E and Midjourney. Unlike its predecessors which often required cloud-based GPU clusters, this model's design allows it to function on local computers with a modest VRAM capacity. Its release under the CreativeML Open RAIL-M license sparked widespread experimentation and integration into various Digital art and Content creation workflows, challenging the market positions of companies like OpenAI and Google Brain.

Technical details

The system operates as a Latent variable model within a Diffusion model framework, a technique pioneered in research from Stanford University and University of California, Berkeley. It first compresses an image into a latent space using an Autoencoder and then applies a U-Net architecture to iteratively denoise Gaussian noise to construct a new image, guided by text embeddings from a CLIP model. Training utilized massive datasets like LAION-5B, curated by the LAION organization, on supercomputers such as the VSC in Austria. Key optimizations, including the use of xFormers attention mechanisms, enable its efficiency on hardware from NVIDIA.

Capabilities and applications

Primary functions include Text-to-image generation, Image inpainting, and Image-to-image translation, enabling tasks from photorealistic rendering to artistic stylization. It has been integrated into commercial tools by Adobe in Photoshop and Canva, and powers independent platforms like DreamStudio. The technology is used for rapid prototyping in industries from game development at studios like Electronic Arts to Architectural visualization, and has spawned communities on GitHub and Hugging Face dedicated to creating specialized LoRA models.

Ethical and societal impact

The public release ignited intense debate concerning Copyright infringement, as the training data included billions of images from the public Internet without explicit consent from creators. High-profile legal challenges have been referenced in discussions around the European Union AI Act and lawsuits involving Getty Images. Concerns about Deepfake creation, Algorithmic bias perpetuating stereotypes, and the potential for generating NSFW content have been raised by researchers at the MIT Media Lab and the Partnership on AI. These issues highlight tensions between open innovation and responsible deployment in the era of Foundation models.

Development and release history

The foundational research was conducted primarily at CompVis under the guidance of Patrick Esser and Robin Rombach, with computational resources and funding provided by Stability AI, founded by Emad Mostaque. Version 1.0 was announced in August 2022, followed by iterative updates including version 2.0 in November 2022 which introduced an updated OpenCLIP text encoder. The model's development was influenced by earlier work on Denoising Diffusion Probabilistic Models and Latent Diffusion Models published on arXiv. Its open-source strategy catalyzed a rapid ecosystem of third-party interfaces and forks, distinguishing its trajectory from the closed approaches of DALL-E 2 and Imagen.

Category:2022 software Category:Artificial intelligence art Category:Deep learning Category:Free and open-source software Category:Generative artificial intelligence Category:Stability AI