Introduction
The first chapter of Introduction to Visual SLAM: From Throey to Practice delves into the mathematical formulation of SLAM problems. In SLAM, the robot estimates its trajectory and creates a map of the environment it is in. The robot also needs to estimate the location of its sensors in order to calculate the measurements it receives.
How to formulate mathematical model?
In the field of SLAM, image sensors are commonly used to collect data from the environment at different time points. To process this data, continuous time is first converted into discrete time stamps from 1 to $\mathit{k}$, after which the locations and map at these moments can be determined. The positions at different time stamps are typically expressed as a vector $\mathbf{x}$, with $\mathbf{x_{1}}$ to $\mathbf{x_{k}}$ representing the position at each time stamp. The map, on the other hand, is made up of several landmarks, with each image including a subset of these landmarks and recording their observations. In total, there are $\mathit{N}$ landmarks in the map, which are denoted as $\mathbf{y_{1}}$ to $\mathbf{y_{N}}$.
To mathematically express and solve the SLAM problem, it is essential to properly formulate the variables involved using various optimization techniques. The SLAM problem can be simplified into two parts, which describe the movement in the environment.
- How does the motion of the system, represented by the variable $\mathbf{x}$, change from time step $\mathit{k}-1$ to $\mathit{k}$?
- How are the sensor observations mathematically described for each landmark $\mathbf{y_{j}}$ at position $\mathbf{x_{k}}$?
Motion Equation
In SLAM, the main challenge in motion estimation is to obtain the current motion under the assumption that the latest motion is already known. This problem can be simplified by using the following equation:
$$\bigtriangleup{\mathbf{x}}=\mathbf{x}-\mathbf{x_{k-1}}$$
The book presents a universal and abstract model for motion estimation using the motion equation, $f(\cdot)$. Now, the focus shifts to determining the factors that determine the current motion. The answer to this question is the input command and random noise. Since the noise is generated randomly, it turns the model into a stochastic model. The final version of the motion equation is as follows:
$$\mathbf{x_{k}}=f(\mathbf{x_{k-1}}, \mathbf{u_{k}}, \mathbf{w_{k}})$$
$$(\mathbf{u_{k}}, \mathbf{w_{k}})=(input, noise)$$
Observation Equation
A universal and abstract model, $h(\cdot)$, is also used for the observation equation in SLAM. So, what exactly is the observation equation? It is the process of generating observed data $\mathbf{z_{k}}$ with landmark points $\mathbf{y_{j}}$ at $\mathbf{x_k}$. Due to the presence of random noise, the observation equation is also a stochastic model. The final version of the observation equation can be expressed as:
$$\mathbf{z_{k,j}}=h(\mathbf{y_{j}}, \mathbf{x_{k}}, \mathbf{v_{k,j}})$$
where $\mathbf{v_{j,k}}$ represents the noise.
Summary
In conclusion, the entire SLAM process can be simplified as solving the estimation problem of $\mathbf{x}$ (localization) and $\mathbf{y}$ (mapping) with the noisy input of $\mathbf{u}$ combined with either $\mathbf{w}$ or $\mathbf{v}$, and input images of $\mathbf{z}$ obtained from the sensor output.
Reference