Abstract

We describe a deep-geometric localizer that is able to estimate the full six degrees-of-freedom (DOF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6DOF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilize a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using deep learning, and use a geometric algorithm (perspective-n-point (PnP)) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6DOF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono) and allows accurate 6DOF pose estimation in a previously mapped environment using a single camera. With results in simulated and real environments, our hybrid algorithm is particularly useful for autonomous vehicles (AVs) and shuttles that might repeatedly traverse the same route.

References

1.
Mur-Artal
,
R.
,
Montiel
,
J. M. M.
, and
Tardos
,
J. D.
,
2015
, “
ORB-SLAM: A Versatile and Accurate Monocular Slam System
,”
IEEE Trans. Rob.
,
31
(
5
), pp.
1147
1163
.
2.
Arandjelović
,
R.
,
Gronat
,
P.
,
Torii
,
A.
,
Pajdla
,
T.
, and
Sivic
,
J.
,
2017
, “
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
40
(
6
), p.
17
.
3.
Lepetit
,
V.
,
Moreno-Noguer
,
F.
, and
Fua
,
P.
,
2009
, “
Epnp: Efficient Perspective-n-Point Camera Pose Estimation
,”
Int. J. Comput. Vis.
,
81
(
2
), pp.
155
166
.
4.
Agarwal
,
S.
,
Snavely
,
N.
,
Simon
,
I.
,
Seitz
,
S. M.
, and
Szeliski
,
R.
,
2009
, “
Building Rome in a Day
,”
2009 IEEE 12th International Conference on Computer Vision
,
Kyoto, Japan
,
Sept. 29–Oct. 2
,
IEEE
, pp.
72
79
.
5.
Fraundorfer
,
F.
, and
Scaramuzza
,
D.
,
2011
, “
Visual Odometry: Part I: The First 30 Years and Fundamentals
,”
IEEE Rob. Autom. Mag.
,
18
(
4
), pp.
80
92
.
6.
Kümmerle
,
R.
,
Grisetti
,
G.
,
Strasdat
,
H.
,
Konolige
,
K.
, and
Burgard
,
W.
,
2011
, “
g 2 O: A General Framework for Graph Optimization
,”
2011 IEEE International Conference on Robotics and Automation
,
Shanghai, China
,
May 9–13
,
IEEE
, pp.
3607
3613
.
7.
Tomatis
,
N.
,
Nourbakhsh
,
I.
, and
Siegwart
,
R.
,
2003
, “
Hybrid Simultaneous Localization and Map Building: A Natural Integration of Topological and Metric
,”
Rob. Auton. Syst.
,
44
(
1
), pp.
3
14
.
8.
Murillo
,
A. C.
,
Guerrero
,
J. J.
, and
Sagues
,
C.
,
2007
, “
Surf Features for Efficient Robot Localization With Omnidirectional Images
,”
Proceedings of 2007 IEEE International Conference on Robotics and Automation
,
Rome, Italy
,
Apr. 10–14
,
IEEE
, pp.
3901
3907
.
9.
Badino
,
H.
,
Huber
,
D.
, and
Kanade
,
T.
,
2011
, “
Visual Topometric Localization
,”
2011 IEEE Intelligent Vehicles Symposium (IV)
,
Baden-Baden, Germany
,
June 5–9
,
IEEE
, pp.
794
799
.
10.
Dayoub
,
F.
,
Morris
,
T.
,
Upcroft
,
B.
, and
Corke
,
P.
,
2013
, “
Vision-Only Autonomous Navigation Using Topometric Maps
,”
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Tokyo, Japan
,
Nov. 3–7
,
IEEE
, pp.
1923
1929
.
11.
Kendall
,
A.
,
Grimes
,
M.
, and
Cipolla
,
R.
,
2015
, “
Posenet: A Convolutional Network for Real-Time 6-dof Camera Relocalization
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Santiago, Chile
,
Dec. 11–18
, pp.
2938
2946
.
12.
Kendall
,
A.
, and
Cipolla
,
R.
,
2017
, “
Geometric Loss Functions for Camera Pose Regression With Deep Learning
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 21–26
, pp.
5974
5983
.
13.
Li
,
R.
,
Wang
,
S.
,
Long
,
Z.
, and
Gu
,
D.
,
2018
, “
Undeepvo: Monocular Visual Odometry Through Unsupervised Deep Learning
,”
2018 IEEE International Conference on Robotics and Automation (ICRA)
,
Brisbane, Australia
,
May 21–25
,
IEEE
, pp.
7286
7291
.
14.
Stereo zed camera, https://www.stereolabs.com/, Accessed September 15, 2018.
15.
Bay
,
H.
,
Tuytelaars
,
T.
, and
Van Gool
,
L.
,
2006
, “
Surf: Speeded Up Robust Features
,”
European Conference on Computer Vision
,
Graz, Austria
,
Springer
, pp.
404
417
.
16.
Simonyan
,
K.
, and
Zisserman
,
A.
,
2014
, “
Very Deep Convolutional Networks for Large-Scale Image Recognition
,”
preprint arXiv:1409.1556
.
17.
Muja
,
M.
, and
Lowe
,
D. G.
,
2009
, “
Fast Approximate Nearest Neighbors With Automatic Algorithm Configuration
,” VISAPP, 1(2), 331–340.
18.
Levenberg
,
K.
,
1944
, “
A Method for the Solution of Certain Non-Linear Problems in Least Squares
,”
Q. Appl. Math.
,
2
(
2
), pp.
164
168
.
19.
Zhang
,
A. M.
, and
Kleeman
,
L.
,
2007
, “
Robust Appearance Based Visual Route Following in Large Scale Outdoor Environments
,”
Proceedings of the Australasian Conference on Robotics and Automation, Brisbane, Australia.
20.
Dosovitskiy
,
A.
,
Ros
,
G.
,
Codevilla
,
F.
,
Lopez
,
A.
, and
Koltun
,
V.
,
2017
, “
CARLA: An Open Urban Driving Simulator
,”
Proceedings of the 1st Annual Conference on Robot Learning
,
Mountain View, CA, Nov. 13–15
.
21.
Maddern
,
W.
,
Pascoe
,
G.
,
Linegar
,
C.
, and
Newman
,
P.
,
2017
, “
1 Year, 1000 km: The Oxford RobotCar Dataset
,”
Int. J. Rob. Res.
,
36
(
1
), pp.
3
15
.
22.
Garcia
,
V.
,
Debreuve
,
E.
, and
Barlaud
,
M.
,
2008
, “
Fast K Nearest Neighbor Search Using GPU
,”
2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
,
Anchorage, AK
,
June 23–28
pp.
1
6
.
23.
Peng
,
S.
,
Liu
,
Y.
,
Huang
,
Q.
,
Zhou
,
X.
, and
Bao
,
H.
,
2019
, “
Pvnet: Pixel-Wise Voting Network for 6dof Pose Estimation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–19
, pp.
4561
4570
.
24.
Schönberger
,
J. L.
,
Pollefeys
,
M.
,
Geiger
,
A.
, and
Sattler
,
T.
,
2018
, “
Semantic Visual Localization
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
6896
6906
.
25.
Voodarla
,
M.
,
Shrivastava
,
S.
,
Manglani
,
S.
,
Vora
,
A.
,
Agarwal
,
S.
, and
Chakravarty
,
P.
,
2021
, “
Semantic Birds-Eye View Representation for Weather and Lighting Invariant 3-dof Localization
,”
preprint arXiv:2101.09569
.
26.
Sattler
,
T.
,
Maddern
,
W.
,
Toft
,
C.
,
Torii
,
A.
,
Hammarstrand
,
L.
,
Stenborg
,
E.
,
Safari
,
D.
,
Okutomi
,
M.
,
Pollefeys
,
M.
,
Sivic
,
J.
,
Kahl
,
F.
, and
Pajdla
,
T.
,
2018
, “
Benchmarking 6dof Outdoor Visual Localization in Changing Conditions
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
8601
8610
.
You do not currently have access to this content.