Introduction
In chapter 1 of Introduction to Visual SLAM: From Theory to Practice, the book introduces a classical Visual SLAM framework and mathematical formulation of SLAM problems. In this post, an overview of the classical Visual SLAM framework will be introduced. You can find more details about each step in upcoming posts.
Structure
The classical Visual SLAM framework consists of three stages, preprocessing, core, and postprocessing. The core part consists of a frontend, a backend, and loop closing, which work together to estimate the robot's trajectory and the 3D map of the environment. Figure 1 is the structure of Visual SLAM.
Step 1. Sensor data acquisition
The first step in the Visual SLAM framework is the acquisition and preprocessing of sensor data, typically images from cameras. These sensors are mounted on the robot or the vehicle and capture the surrounding environment. The sensor data is then used in the subsequent stages of the framework.
Step 2. Frontend: Visual Odometry
The visual odometry component is responsible for tracking the robot's motion by analyzing the sensor data. It estimates the robot's motion by comparing the current sensor data with the previous ones. The frontend typically involves feature detection and description, feature matching, and motion estimation using methods such as optical flow, structure from motion, or bundle adjustment. Its final goal is to generate a rough local map.
Step 3. Backend: Filtering/Optimization
The backend component takes camera poses from the frontend and results from loop closing as input and optimizes the robot's trajectory and the 3D map of the environment. It involves data association, pose graph optimization, and nonlinear optimization using filters such as the extended Kalman filter or the particle filter.
Step 4: Loop Closing
In the loop closing stage, the SLAM system detects and corrects the accumulated drift and errors in the robot's trajectory and map due to the absence of absolute positioning information. The system identifies loop closure by matching feature from different parts of the trajectory and adjusting the robot's pose and the map accordingly.
Step 5: Reconstruction
The final step in Visual SLAM framework is to reconstruct the 3D map of the environment. The map is built by integrating the estimated robot's trajectory and the 3D landmarks. The reconstruction can by performed using different methods such as triangulation, stereo reconstruction, or dense reconstruction.
Conclusion
In conclusion, the classical Visual SLAM framework is an essential tool for robotics, computer vision, and autonomous driving applications. By understanding the components of the SLAM framework, we can develop more robust and efficient SLAM systems. In upcoming posts, we will dive deeper into each component and discuss the techniques and algorithms used in Visual SLAM.
Reference