A Look at the Latest Visual SLAM Technology Framework Options in robotics
What is visual simultaneous localization and mapping (SLAM)? In short, visual SLAM technologies use visual information to help robots navigate and understand their surroundings. Consider autonomous mobile robots (AMRs) and automated guided vehicles (AGVs), which have become increasingly popular in recent years. These robots rely heavily on SLAM technologies to avoid collisions and keep the plant floor safe.
Robot Navigation
Visual SLAM does not encompass any specific set of algorithms or software but instead is the process of determining the position and orientation of a sensor with respect to its surroundings while simultaneously mapping the environment around that sensor. A visual SLAM process begins with extracting landmarks or features from a point cloud of data generated by a camera — often a 3D camera — and then confirming feature location by matching data from different sensor networks. SLAM navigation systems then update the position of a vehicle such as an AMR using GPS, odometer, and/or inertial navigation system (INS) data before estimating the position of future landmarks based on the mobile platform’s current position.
Visual SLAM technologies have overtaken 2D lidar systems as a primary means for navigation for next-generation robotics. Visual SLAM systems use different types of sensors and cameras, including wide-angle and spherical cameras, 3D cameras that use time of flight, stereo vision, and depth technologies. Robot manufacturers and systems integrators developing visual SLAM systems have options when it comes to cameras, of course, as many A3 member companies offer cameras suitable for use in such systems. Still, images alone aren’t enough. SLAM systems require algorithms that can help with navigation and decision-making based on the images acquired by the cameras.
Visual SLAM Frameworks
A recent paper written by Samsung researchers for the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) benchmarked and compared three standout, general-purpose SLAM approaches for real-world robot applications: ORB-SLAM3, OpenVSLAM, and RTABMap.
Designed to work with a variety of different sensors, ORB-SLAM3 builds on the initial ORB-SLAM solution for monocular, stereo, and RGB-D cameras. This technique uses the ORB (Oriented FAST and Rotated BRIEF) algorithm to provide short- and medium-term tracking. ORB-SLAM3 offers new features, including visual inertial SLAM for sensor fusion of visual and inertial sensors, which results in more robust tracking in situations with fewer point features. It also offers multi-map and fisheye camera support capabilities, which its predecessors did not.
OpenVSLAM is a newer module-structured implementation of an ORB feature-based visual graph SLAM that contains optimized implementations of feature extractors and stereo matchers. This technique offers a unique frame tracking module that enables fast and accurate localization. Designed for scalability and integration into practical platforms and not just for research, OpenVSLAM can leverage many more types of cameras than a typical visual SLAM implementation, including equirectangular and fisheye models. Additionally, this technique contains all the major features of the ORB-SLAM family — sometimes more — aside from inertial measurement unit (IMU) fusion support.
One of the older SLAM techniques, RTABMap was first released in 2013 and covers a wide variety of sensor types, including stereo, 3D depth, and fisheye cameras, as well as odometry and 2D/3D lidar data. This, according to the paper’s authors, makes RTABMap a flexible SLAM approach unique from other methods. RTABMap creates dense 2D and 3D representation of an environment analogous to a pure-2D lidar SLAM, which often allows it to replace existing SLAM methods without further post-processing.
A Standout Choice
The researchers tested all three methods extensively and found that each method has its benefits. For indoor environments with stereo sensors, OpenVSLAM and ORB-SLAM3 with inertial fusion performed best, while with 3D depth sensors, OpenVSLAM and RTABMap performed well, even in low-feature conditions. Outdoors, OpenVSLAM and ORB-SLAM3 techniques performed similarly well, but OpenVSLAM was ultimately more reliable in the experiments.
Overall, the researchers declared that OpenVSLAM was the best overall general-purpose technique for the broadest range of robot types, environments, and sensors, having performed well in all three studies. The researchers noted that ORB-SLAM3 with inertial fusion is a suitable option, while RTABMap showed lower overall performance in the tests. The testing also showed that due to improvements in overall accuracy and robustness of performance, incorporating an IMU fusion technique when possible is preferred.
As with other advanced automation systems and technologies, plenty of options exist when it comes to developing a solution. In the case of visual SLAM, this means that robot manufacturers, systems integrators, and end users have a choice on both the hardware and the software side of things.