LLaMA — LLMpedia

LLaMA
Name	LLaMA
Developer	Meta AI
Release date	February 2023
Type	Large language model
License	Non-commercial research license

Contents

Overview
Development and release
Model architecture and variants
Training and capabilities
Impact and reception

LLaMA. A family of foundation models for natural language processing developed by researchers at Meta AI. Released in February 2023, these models were designed to be more efficient and accessible than many contemporary systems, demonstrating that smaller models trained on larger datasets could achieve competitive performance. The release significantly influenced the open-source AI community and spurred widespread research into efficient model training and deployment.

Overview

The LLaMA project was initiated by a team at Meta AI led by researchers including Guillaume Lample and Timothée Lacroix. It represented a strategic shift in the artificial intelligence landscape, challenging the prevailing trend of ever-larger parameter counts by emphasizing the quality and scale of the training data. By making the model weights available under a non-commercial license to researchers affiliated with academic institutions and organizations like AI2, CIFAR, and Stanford University, Meta aimed to democratize access to state-of-the-art language model technology. This approach contrasted with the closed models of competitors such as OpenAI and Google DeepMind, fostering a new wave of innovation in the field.

Development and release

Development of the initial models was conducted by a large team at Meta AI, with key contributions from scientists across its offices in Menlo Park and Paris. The project was announced in a research paper submitted to arXiv in February 2023, coinciding with the release of the model weights. The release strategy was carefully managed, requiring applicants from the research community to request access via a form to prevent misuse. This period saw intense activity from groups like Together Computer and researchers at UC Berkeley who worked to host and fine-tune the models. The subsequent leak of the weights to platforms like 4chan and GitHub in March 2023, while controversial, led to an explosion of derivative projects and community adaptations.

Model architecture and variants

The LLaMA models are based on the Transformer architecture, specifically utilizing the efficient design of the GPT-3 model. The family includes multiple variants parameterized by size: LLaMA-7B, LLaMA-13B, LLaMA-33B, and the largest, LLaMA-65B. All models use the RMSNorm pre-normalization technique, the SwiGLU activation function, and rotary positional embeddings, architectural choices informed by prior work from models like PaLM and GPT-Neo. This design prioritizes training stability and inference efficiency. The tokenizer is a Byte-Pair Encoding model trained on data from Wikipedia and Project Gutenberg, allowing it to process a wide vocabulary.

Training and capabilities

The models were trained on a massive corpus of publicly available text data totaling over 1.4 trillion tokens, sourced from datasets including Common Crawl, C4, and The Pile. Training was performed on clusters of NVIDIA A100 GPUs using the FairScale library for optimization. Despite their smaller size relative to contemporaries like Chinchilla or GPT-4, the LLaMA models demonstrated remarkable proficiency on benchmarks such as MMLU, GSM8K, and HumanEval, often matching or exceeding the performance of much larger models. Their efficiency made them particularly suitable for research on tasks like mathematical reasoning and code generation, and they became a popular base for fine-tuning projects like Alpaca and Vicuna.

Impact and reception

The release of LLaMA was met with widespread acclaim from the academic and open-source AI communities, though it also sparked debates about AI ethics and responsible AI licensing. It catalyzed the development of numerous derivative models and toolkits, influencing projects at organizations like Hugging Face and Microsoft. The success of the approach validated the data-centric scaling hypothesis and pressured other industry players, including Anthropic and Cohere, to reconsider their release strategies. Its legacy is evident in the rapid advancement and proliferation of accessible, capable language models that followed throughout 2023 and 2024.

Category:Large language models Category:Meta Platforms Category:2023 software