ORB-SLAM — LLMpedia

ORB-SLAM
Name	ORB-SLAM
Developer	Raul Mur-Artal, Juan D. Tardós
Initial release	2015
Programming language	C++
License	BSD
Platform	Linux, Windows

Contents

Overview
System Architecture
Feature Extraction and Matching
Tracking and Pose Estimation
Mapping and Loop Closure
Performance and Evaluation
Applications and Extensions

ORB-SLAM is a real-time visual simultaneous localization and mapping system developed for monocular, stereo, and RGB-D cameras, notable for combining robust feature tracking, keyframe-based mapping, and loop closure. It integrates feature detection, pose graph optimization, and place recognition to produce consistent metric maps for robotics, augmented reality, and autonomous navigation. The system has influenced research across computer vision, robotics, and photogrammetry communities.

Overview

ORB-SLAM was introduced by researchers from the Autonomous Systems Lab, reflecting advances built upon prior work such as PTAM, SIFT, and SURF. It uses ORB features, which are inspired by FAST keypoints and BRIEF descriptors, to enable efficient matching on platforms from NVIDIA GPUs to embedded systems like ARM processors. The system's design connects ideas from sparse bundle adjustment research exemplified by groups at ETH Zurich, KIT, and Université de Strasbourg with place recognition approaches from teams at Google and Microsoft Research.

System Architecture

The architecture follows a keyframe-based pipeline influenced by architectures at Oxford University and Massachusetts Institute of Technology. It splits processing into parallel threads for tracking, local mapping, and loop closing, resembling multi-threaded systems from Stanford University and Carnegie Mellon University. The map representation, a sparse point cloud with covisibility relations, echoes map structures used in projects at NASA and DARPA research initiatives. Backend optimization uses pose graph methods similar to those from Technische Universität München and incorporated techniques from the Ceres Solver community.

Feature Extraction and Matching

Feature extraction relies on ORB descriptors, combining rotation-invariant detection ideas from Harris detectors with descriptor encoding influenced by BRIEF and BRISK. The extractor is tuned for scale pyramids and orientation assignment, drawing on design choices advocated by researchers at University of Oxford and University of California, Berkeley. Matching uses a vocabulary tree for place recognition related to methods proposed at INRIA and popularized by groups at Google Research and Microsoft Research; this enables efficient indexing akin to systems used in ImageNet and Pascal VOC evaluations. Cross-check and ratio tests mirror strategies from work at Brown University and University of Washington.

Tracking and Pose Estimation

Tracking employs motion models and PnP solvers to estimate camera pose, leveraging algorithms developed at ETH Zurich and influenced by methods from EPFL and Imperial College London. Pose refinement uses bundle adjustment strategies associated with efforts from University of Pennsylvania and the Robot Operating System community. When failures occur, relocalization leverages place recognition and RANSAC pipelines popularized by teams at University of California, San Diego and Johns Hopkins University. The system handles scale drift and monocular ambiguity with techniques comparable to monocular SLAM research at KTH Royal Institute of Technology and Tokyo Institute of Technology.

Mapping and Loop Closure

Mapping constructs a covisibility graph and performs local and global bundle adjustment, reflecting methodologies from Princeton University and University of Cambridge research groups. Loop closure uses bag-of-words models and pose graph optimization influenced by the work at INRIA Grenoble and by loop detection frameworks seen in projects at ETH Zurich and ISRO-affiliated labs. The optimization backend employs sparse linear solvers and Levenberg–Marquardt routines derived from implementations used by Google and NASA Jet Propulsion Laboratory teams. Map maintenance strategies parallel long-term mapping efforts undertaken at Toyota Research Institute and Honda Research Institute.

Performance and Evaluation

Evaluations commonly reference public benchmarks from TUM RGB-D Dataset, KITTI Vision Benchmark Suite, and datasets curated by Oxford Robotics Institute and ICRA community challenges. Performance comparisons often include methods like LSD-SLAM, DSO, and systems from Facebook AI Research and DeepMind exploring learned alternatives. Metrics include absolute trajectory error and relative pose error employed in studies by IEEE and CVPR workshops. Implementation efficiency is frequently profiled on hardware from Intel, NVIDIA, and embedded boards from Qualcomm and Raspberry Pi Foundation.

Applications and Extensions

ORB-SLAM has been applied in robotics platforms at Boston Dynamics and Clearpath Robotics, augmented reality systems developed by Apple and Google teams, and autonomous vehicle stacks from Tesla and research groups at Waymo. Extensions incorporate semantic segmentation modules from Facebook AI Research and deep learning pose predictors from Stanford AI Lab and UC Berkeley AI Research. Variants include stereo and RGB-D adaptations used in projects at Microsoft Research Cambridge and mapping integrations with ROS and Gazebo simulation environments. The system continues to inform projects in fields pursued at institutions like MIT, Harvard University, Caltech, and University of Toronto.

Category:Computer vision