Abstract
We describe a deep-geometric localizer that is able to estimate the full six degrees-of-freedom (DOF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6DOF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilize a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using deep learning, and use a geometric algorithm (perspective-n-point (PnP)) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6DOF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono) and allows accurate 6DOF pose estimation in a previously mapped environment using a single camera. With results in simulated and real environments, our hybrid algorithm is particularly useful for autonomous vehicles (AVs) and shuttles that might repeatedly traverse the same route.