RoBERTa (Facebook)

RoBERTa (Facebook)
Name	RoBERTa
Developer	Facebook AI Research
Released	2019
Based on	Transformer
License	MIT-like

Contents

Overview
Architecture and Training
Pretraining Corpus and Objectives
Performance and Benchmarks
Variants and Implementations
Applications and Impact

RoBERTa (Facebook) is a self-supervised language model introduced by Facebook AI Research in 2019 as an optimized reimplementation of a prior Transformer-based model. It improved performance on many natural language understanding benchmarks by adjusting pretraining recipes and scaling data and compute. The model influenced subsequent research at institutions such as Google Research, OpenAI, Microsoft Research, Stanford University, and MIT.

Overview

RoBERTa emerged from work at Facebook AI Research that revisited methods from BERT research groups at Google AI, and from related Transformer developments originating with Vaswani et al. at Google Brain. The project reported gains on benchmarks documented by organizers at GLUE, SuperGLUE, and leaders from Allen Institute for AI. RoBERTa's release catalyzed follow-up efforts at labs including Carnegie Mellon University, University of Toronto, and ETH Zurich, leading to widespread adoption in industry stacks such as those maintained by Hugging Face, Amazon Web Services, and IBM Research.

Architecture and Training

RoBERTa preserves the core Transformer encoder architecture introduced in work by Vaswani et al. and popularized by BERT teams at Google Research. The model uses multi-head self-attention layers comparable to architectures employed by OpenAI and DeepMind in contemporaneous projects. RoBERTa variants match configurations from communities around PyTorch and TensorFlow toolchains, and training employed infrastructure similar to clusters at NVIDIA and Google Cloud Platform. Engineering practices drew on optimization techniques used in projects at Facebook, Microsoft Research, and Intel.

Pretraining Corpus and Objectives

RoBERTa's pretraining corpus combined large-scale text collections curated from sources connected to efforts at Common Crawl, datasets associated with Wikipedia (English) projects, and web datasets similar to corpora used by teams at OpenAI and Google Research. Its objective refined the masked language modeling approach introduced by researchers at Google AI and contrasted with objectives explored at Facebook AI Research and Salesforce Research. The team altered training schedules inspired by findings from groups at Stanford University, Carnegie Mellon University, and University of Washington to use dynamic masking and longer sequences, building on tokenization practices from projects at Byte Pair Encoding proponents and applications by Hugging Face.

Performance and Benchmarks

RoBERTa reported improved results on leaderboards maintained by organizations such as GLUE, SuperGLUE, and evaluators associated with SQuAD and RACE tasks. Benchmarks compared results with models from Google Research, OpenAI, Microsoft Research, and teams at DeepMind. Independent evaluations by groups at Allen Institute for AI and University of California, Berkeley investigated robustness under adversarial setups studied by researchers at NYU and Columbia University. The model influenced evaluation protocols adopted by conferences like NeurIPS, ICML, and ACL.

Variants and Implementations

Following its release, researchers and engineers produced variants implemented in ecosystems led by Hugging Face, Fairseq, and repositories associated with GitHub. Extensions included distilled versions by teams at Google Research and Amazon Research, multilingual adaptations developed at Microsoft Research and University of Edinburgh, and domain-specific fine-tunings used in projects at Johns Hopkins University and Broad Institute. Implementations interfaced with platforms from PyTorch, TensorFlow, and acceleration libraries from NVIDIA and Intel.

Applications and Impact

RoBERTa has been applied in systems produced by companies such as Facebook, Microsoft, Amazon, and Google for tasks including question answering, summarization, and information retrieval studied at Stanford University and Columbia University. It played a role in academic work at institutions like Harvard University, Yale University, and Princeton University investigating ethical and societal implications noted by panels at AAAI and ACM. Policy discussions at agencies and organizations such as European Commission, UK Research and Innovation, and UNESCO referenced advances in models of this class when addressing transparency, fairness, and compute considerations highlighted by groups at OpenAI and Partnership on AI.

Category:Language models