RAG — LLMpedia

RAG
Name	RAG
Abbreviation	RAG
Field	Artificial intelligence; Information retrieval; Natural language processing
Introduced	2020s
Developers	Research labs; Technology companies; Open-source communities
Related	Transformers; Vector databases; Dense retrieval; Sparse retrieval; Knowledge-augmented models

Contents

Definition and Overview
History and Development
Architecture and Components
Retrieval Techniques
Integration with Large Language Models
Applications
Evaluation and Metrics
Ethical and Security Considerations

RAG

Retrieval-augmented generation (RAG) is an approach combining document retrieval and sequence generation to produce responses grounded in external corpora. It couples information retrieval systems with large pretrained sequence models to improve factuality, context-awareness, and grounding of generated text. RAG techniques are used by researchers and engineers at organizations such as Google (company), OpenAI, Meta Platforms, Inc., Microsoft, and academic groups across institutions like Massachusetts Institute of Technology, Stanford University, and Carnegie Mellon University.

Definition and Overview

RAG interleaves components from retrieval systems like BM25 and dense vector search with generative models such as GPT-3, BERT-derived decoders, and encoder–decoder architectures originating from Transformer (machine learning model). It aims to address hallucination in models used by companies like Anthropic and platforms like Hugging Face by conditioning generation on retrieved passages from knowledge sources including Wikipedia, enterprise document stores like SharePoint, and specialized corpora such as PubMed. Typical deployments combine retrieval backends like Elasticsearch, vector stores like FAISS or Milvus, and neural generators trained with objectives from research at venues like NeurIPS, ICML, and ACL.

History and Development

Early antecedents include hybrid systems that integrated retrieval with statistical language models from groups at Google Research and Microsoft Research. The term gained prominence after publications by teams at Facebook AI Research and collaborators demonstrating tighter coupling between retrieval and sequence-to-sequence models. Milestones include demonstrations of open-domain question answering using retrieved passages alongside models influenced by work at OpenAI and decoder models from University of Washington research. Progress accelerated alongside advances in dense embedding methods inspired by Siamese networks and contrastive learning popularized in papers from Stanford, Berkeley, and industry labs.

Architecture and Components

Core components include a retriever, an index, and a generator. Retrievers use lexical techniques like BM25 or dense embeddings computed by encoders derived from BERT or RoBERTa; indexes rely on technologies like FAISS, Annoy, Milvus, or Elasticsearch. Generators are autoregressive or encoder–decoder models influenced by architectures from Transformer (machine learning model), implemented via frameworks such as PyTorch and TensorFlow. Supporting components often include passage selectors, rerankers trained with data from datasets like Natural Questions and SQuAD, and curriculum from corpora like Common Crawl and institutional datasets from World Health Organization or National Institutes of Health for domain adaptation.

Retrieval Techniques

Retrieval strategies span sparse lexical retrieval, dense semantic retrieval, and hybrid approaches. Sparse methods use inverted indexes and scoring functions like BM25; dense methods learn vector spaces using contrastive losses informed by training collections such as MS MARCO and TREC; hybrid systems combine lexical signals with vector similarity to leverage resources like Wikipedia and enterprise repositories like Confluence. Reranking employs cross-encoders or models trained on relevance judgments modeled after datasets from CLEF and NIST to refine top-k candidates before generation.

Integration with Large Language Models

Integration patterns include retrieve-and-then-generate, retrieve-and-refine, and end-to-end differentiable training where retriever and generator are jointly optimized. Implementations often adapt pretrained models like GPT-2, GPT-J, and encoder models from Google Research or EleutherAI, and they fine-tune with instruction datasets modeled on formats from OpenAI guides. Production systems embed retrieval within prompting pipelines used by services like Azure and Amazon Web Services to support chat interfaces and knowledge assistants.

Applications

Applications span open-domain question answering, enterprise knowledge assistants, legal and medical document summarization, and personalized recommendation. Prominent use-cases appear in products from Google (company), Microsoft, startups leveraging Hugging Face repositories, scholarly tools at arXiv, and clinical support systems referencing PubMed and ClinicalTrials.gov. RAG supports downstream tasks including fact-checking against corpora like FactCheck.org and literature review automation for researchers at institutions such as Harvard University and Johns Hopkins University.

Evaluation and Metrics

Evaluation leverages retrieval metrics like recall@k and mean reciprocal rank (MRR) as used in TREC benchmarks, and generation metrics including ROUGE, BLEU, and factuality-focused measures such as FEQA and hallucination rate analyses reported at conferences like ACL and EMNLP. Human evaluation protocols reference standards from CHI and task-specific relevance assessments modeled on datasets like MS MARCO and Natural Questions.

Ethical and Security Considerations

Ethical concerns include provenance, bias, and privacy when grounding outputs in sources like Wikipedia or proprietary corpora from companies such as Bloomberg L.P. and Thomson Reuters. Security risks include data leakage from indexed documents and adversarial retrieval attacks explored in literature from Stanford University and MIT. Mitigations draw on audit trails, source attribution, red-teaming methods promoted by OpenAI and governance frameworks from bodies like IEEE and ISO.

Category:Artificial intelligence Category:Information retrieval Category:Natural language processing