Tutorial On Visual Odometry

2y ago
7 Views
2 Downloads
4.04 MB
55 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Mya Leung
Transcription

Tutorial on Visual OdometryDavide ScaramuzzaRobotics and Perception Grouphttp://rpg.ifi.uzh.chUniversity of Zurich

Outline Theory Open Source Algorithms

VO is the process of incrementally estimating the pose of the vehicle byexamining the changes that motion induces on the images of its onboardcamerasinputoutputImage sequence (or video stream)from one or more cameras attached to a moving vehicleCamera trajectory (3D structure is a plus):Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Sufficient illumination in the environment Dominance of static scene over moving objects Enough texture to allow apparent motion to be extracted Sufficient scene overlap between consecutive frames Is any of these scenes good for VO? Why?Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

1980: First known VO real-time implementation on a robot by Hans Moraveck PhDthesis (NASA/JPL) for Mars rovers using one sliding camera (sliding stereo). 1980 to 2000: The VO research was dominated by NASA/JPL in preparation of2004 Mars mission (see papers from Matthies, Olson, etc. from JPL) 2004: VO used on a robot on another planet: Mars rovers Spirit and Opportunity 2004. VO was revived in the academic environmentby Nister «Visual Odometry» paper.The term VO became popular.Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Scaramuzza, D., Fraundorfer, F., Visual Odometry: Part I - The First 30 Years andFundamentals, IEEE Robotics and Automation Magazine, Volume 18, issue 4, 2011. Fraundorfer, F., Scaramuzza, D., Visual Odometry: Part II - Matching, Robustness, andApplications, IEEE Robotics and Automation Magazine, Volume 19, issue 1, 2012.Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

SFMVSLAMVODavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

SFM is more general than VO and tackles the problem of 3Dreconstruction and 6DOF pose estimation from unordered image setsReconstruction from 3 million images from Flickr.comCluster of 250 computers, 24 hours of computation!Paper: “Building Rome in a Day”, ICCV’09Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO is a particular case of SFM VO focuses on estimating the 3D motion of the camerasequentially (as a new frame arrives) and in real time. Terminology: sometimes SFM is used as a synonym of VODavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO only aims to the local consistency of thetrajectory SLAM aims to the global consistency of thetrajectory and of the map VO can be used as a building block of SLAM VO is SLAM before closing the loop! The choice between VO and V-SLAM depends onthe tradeoff between performance andconsistency, and simplicity in implementation. VO trades off consistency for real-timeperformance, without the need to keep track of allthe previous history of the camera.Visual odometryVisual SLAMImage courtesy from [Clemente, RSS’07]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

1.Compute the relative motion 𝑇𝑘 from images 𝐼𝑘 1 to image 𝐼𝑘𝑇𝑘 2.𝑅𝑘,𝑘 10𝑡𝑘,𝑘 11Concatenate them to recover the full trajectory𝐶𝑛 𝐶𝑛 1 𝑇𝑛3.An optimization over the last m poses can be done to refine locallythe trajectory (Pose-Graph or Bundle Adjustment)𝑪𝟎𝑪𝟏𝑪𝟑𝑪𝒏 𝟏𝑪𝟒𝑪𝒏.Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

The front-end is responsible for Feature extraction, matching, and outlier removal Loop closure detection The back-end is responsible for the pose and structureoptimization (e.g., iSAM, g2o, Google Ceres)Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

How do we estimate the relative motion 𝑇𝑘 ?Image 𝐼𝑘 1Image 𝐼𝑘𝑇𝑘𝑇𝑘“An Invitation to 3D Vision”, Ma, Soatto, Kosecka, Sastry, Springer, 2003

Direct Image AlignmentIt minimizes the per-pixel intensity difference𝑇𝑘,𝑘 1 arg min𝑇𝐼𝑘 𝒖′𝑖 𝐼𝑘 1 (𝒖𝑖 )2𝜎𝑖DenseSemi-DenseDTAM [Newcombe et al. ‘11]300’000 pixelsLSD [Engel et al. 2014] 10’000 pixelsSparseSVO [Forster et al. 2014]100-200 features x 4x4 patch 2,000 pixelsIrani & Anandan, “All About Direct Methods,” Vision Algorithms: Theory and Practice, Springer, 2000

Direct Image AlignmentIt minimizes the per-pixel intensity difference𝑇𝑘,𝑘 1 arg min𝑇𝐼𝑘 𝒖′𝑖 𝐼𝑘 1 (𝒖𝑖 )2𝜎𝑖DenseSemi-DenseDTAM [Newcombe et al. ‘11]300,000 pixelsLSD-SLAM [Engel et al. 2014] 10,000 pixelsSparseSVO [Forster et al. 2014]100-200 features x 4x4 patch 2,000 pixelsIrani & Anandan, “All About Direct Methods,” Vision Algorithms: Theory and Practice, Springer, 2000

Feature-based methods𝑇𝑘,𝑘 1 ?1. Extract & match features ( RANSAC)2. Minimize Reprojection errorminimization𝒖 ′ 𝑖 𝜋 𝒑𝑖𝑇𝑘,𝑘 1 arg min𝑇𝒖′𝑖𝒖𝑖𝒑𝑖2Σ𝑖Direct methods𝑇𝑘,𝑘 11. Minimize photometric error𝑇𝑘,𝑘 1 arg min𝑇𝐼𝑘 𝒖′𝑖 𝐼𝑘 1 (𝒖𝑖 )2𝜎𝐼𝑘 1𝐼𝑘𝒖𝑖𝒖′𝑖𝑖where 𝒖′𝑖 𝜋 𝑇 𝜋 1 𝒖𝑖 𝑑[Jin,Favaro,Soatto’03] [Silveira, Malis, Rives, TRO’08], [Newcombe et al., ICCV ‘11],[Engel et al., ECCV’14], [Forster et al., ICRA’14]𝑑𝑖𝒑𝑖16

Feature-based methods Large frame-to-frame motions1. Extract & match features ( RANSAC) Accuracy: Efficient optimization ofstructure and motion (Bundle Adjustment)2. Minimize Reprojection errorminimization Slow due to costly feature extractionand matching Matching Outliers (RANSAC)𝒖 ′ 𝑖 𝜋 𝒑𝑖𝑇𝑘,𝑘 1 arg min𝑇2Σ𝑖 All information in the image can beDirect methodsexploited (precision, robustness)1. Minimize photometric error𝑇𝑘,𝑘 1 arg min𝑇𝐼𝑘 𝒖′𝑖 𝐼𝑘 1 (𝒖𝑖 )𝑖where 𝒖′𝑖 𝜋 𝑇 𝜋 1 𝒖𝑖 𝑑 Increasing camera frame-rate2𝜎reduces computational cost perframe Limited frame-to-frame motion Joint optimization of dense structure[Jin,Favaro,Soatto’03] [Silveira, Malis, Rives, TRO’08], [Newcombe et al., ICCV ‘11],and motion[Engel et al., ECCV’14], [Forster et al., ICRA’14]too expensive17

VO computes the camera path incrementally (pose after pose)Image sequenceFront-endFeature detectionFeature matching (tracking)Motion estimation2D-2D3D-3DLocal optimization3D-2DBack-endDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO computes the camera path incrementally (pose after pose)Image sequenceFeature detectionFeature matching (tracking)Motion estimation2D-2D3D-3DExample features tracks3D-2DLocal optimizationDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

A corner is defined as the intersection of one or more edges A corner has high localization accuracy Corner detectors are good for VO It’s less distinctive than a blob E.g., Harris, Shi-Tomasi, SUSAN, FASTHarris corners A blob is any other image pattern, which is not a corner, thatsignificantly differs from its neighbors in intensity and texture Has less localization accuracy than a corner Blob detectors are better for place recognition It’s more distinctive than a corner E.g., MSER, LOG, DOG (SIFT), SURF, CenSurESIFT featuresDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO computes the camera path incrementally (pose after pose)Image sequenceFeature detectionCk 1Feature matching (tracking)Tk 1,kCkMotion estimation2D-2D3D-3D3D-2DTk,k-1Local optimizationDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.chCk-1

Motion estimation2D-2DMotion from Image Feature Correspondences Both feature points 𝑓𝑘 1 and 𝑓𝑘 are specified in 2D The minimal-case solution involves 5-point correspondences The solution is found by minimizing the reprojection error: Popular algorithms: 8- and 5-point algorithms [Hartley’97, Nister’06]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch3D-2D3D-3D

Motion estimation2D-2D3D-2DMotion from 3D Structure and Image Correspondences 𝑓𝑘 1 is specified in 3D and 𝑓𝑘 in 2D This problem is known as camera resection or PnP (perspective from n points) The minimal-case solution involves 3 correspondences ( 1 for disambiguating the 4solutions) The solution is found by minimizing the reprojection error: Popular algorithms: P3P [Gao’03, Kneip’11]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch3D-3D

Motion estimation2D-2D3D-2D3D-3DMotion from 3D-3D Point Correspondences (point cloud registration) Both 𝑓𝑘 1 and 𝑓𝑘 are specified in 3D. To do this, it is necessary to triangulate 3D points(e.g. use a stereo camera) The minimal-case solution involves 3 non-collinear correspondences The solution is found by minimizing the 3D-3D Euclidean distance: Popular algorithm: [Arun’87] for global registration, ICP for local refinement or BundleAdjustment (BA)Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Type XXDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Keyframe 1Initial pointcloudKeyframe 2Current frameNew keyframeNew triangulated pointsTypical visual odometry pipeline used in many algorithms[Nister’04, PTAM’07, LIBVISO’08, LSD-SLAM’14, SVO’14, ORB-SLAM’15]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

When frames are taken at nearby positions compared to the scene distance, 3Dpoints will exibit large uncertaintyDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

When frames are taken at nearby positions compared to the scene distance, 3Dpoints will exibit large uncertaintyOne way to avoid this consists of skipping frames until the average uncertainty ofthe 3D points decreases below a certain threshold. The selected frames arecalled keyframesRule of the thumb: add a keyframe when keyframe distance threshold ( 10-20 %)average-depth.Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Matched points are usually contaminated by outliersCauses of outliers are: image noise occlusions blur changes in view point and illuminationFor the camera motion to be estimated accurately, outliers must be removedThis is the task of Robust EstimationDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Error at the loop closure: 6.5 m Error in orientation:5 deg Trajectory length:400 mBefore removing the outliersAfter removing the outliersOutliers can be removed using RANSAC [Fishler & Bolles, 1981]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO computes the camera path incrementally (pose after pose)Image sequenceFront-endFeature detectionFeature matching (tracking)Motion ���𝒏 𝟏𝑪𝟒.Local optimization (back-end)Back-endDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch𝑪𝒏

Pose-Graph Optimization So far we assumed that the transformations are between consecutive ��𝟐,𝟎𝑪𝒏 �𝒏𝑻𝒏,𝒏 𝟏𝑻𝒏 𝟏,𝟐 Transformations can be computed also between non-adjacent frames 𝑇𝑖𝑗 (e.g., whenfeatures from previous keyframes are still observed). They can be used as additionalconstraints to improve cameras poses by minimizing the following:𝐶𝑖 𝑇𝑖𝑗 𝐶𝑗𝑖2𝑗 For efficiency, only the last 𝑚 keyframes are used Gauss-Newton or Levenberg-Marquadt are typically used to minimize it. For large graphs,efficient open-source tools: g2o, GTSAM, Google Ceres

Bundle Adjustment 𝟐,𝟎𝑪𝒏 �𝒏𝑻𝒏,𝒏 𝟏𝑻𝒏 𝟏,𝟐 Similar to pose-graph optimization but it also optimizes 3D points In order to not get stuck in local minima, the initialization should be close to the minimum Gauss-Newton or Levenberg-Marquadt can be used. For large graphs, efficient opensource software exists: GTSAM, g2o, Google Ceres can be used.

BA is more precise than pose-graph optimization because it adds additionalconstraints (landmark constraints)But more costly: 𝑂 𝑞𝑀 𝑙𝑁 3 with 𝑀 and 𝑁 being the number of pointsand cameras poses and 𝑞 and 𝑙 the number of parameters for points andcamera poses. Workarounds: A small window size limits the number of parameters for the optimization and thusmakes real-time bundle adjustment possible. It is possible to reduce the computational complexity by just optimizing over thecamera parameters and keeping the 3-D landmarks fixed, e.g., (motion-only BA)Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Loop constraints are very valuable constraints for pose graph optimization Loop constraints can be found by evaluating visual similarity between thecurrent camera images and past camera images. Visual similarity can be computed using global image descriptors (GISTdescriptors) or local image descriptors (e.g., SIFT, BRIEF, BRISK features) Image retrieval is the problem of finding the most similar image of a templateimage in a database of billion images (image retrieval). This can be solvedefficiently with Bag of Words [Sivic’03, Nister’06, FABMAP, Galvez-Lopez’12(DBoW2)]First observationSecond observation after a loopDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

MAVMAP: https://github.com/mavmap/mavmap Pix4D: https://pix4d.com/Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO (i.e., no loop closing) Modified PTAM: (feature-based, mono): http://wiki.ros.org/ethzasl ptamLIBVISO2 (feature-based, mono and stereo): http://www.cvlibs.net/software/libvisoSVO (semi-direct, mono, stereo, multi-cameras): https://github.com/uzh-rpg/rpg svoVIO ROVIO (tightly coupled EKF): https://github.com/ethz-asl/rovioOKVIS (non-linear optimization): https://github.com/ethz-asl/okvisVSLAM ORB-SLAM (feature based, mono and stereo): https://github.com/raulmur/ORB SLAMLSD-SLAM (semi-dense, direct, mono): https://github.com/tum-vision/lsd slamDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

VO (i.e., no loop closing) Modified PTAM (Weiss et al.,): (feature-based, mono): http://wiki.ros.org/ethzasl ptamSVO (Forster et al.) (semi-direct, mono, stereo, multi-cameras): https://github.com/uzhrpg/rpg svoIMU-Vision fusion: Multi-Sensor Fusion Package (MSF) (Weiss et al.) - EKF, loosely-coupled:http://wiki.ros.org/ethzasl sensor fusionSVO GTSAM (Forster et al. RSS’15) (optimization based, pre-integrated IMU):https://bitbucket.org/gtborg/gtsam Instructions here: http://arxiv.org/pdf/1512.02363Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

GTSAM: n node%2F299 G2o: https://openslam.org/g2o.html Google Ceres Solver: http://ceres-solver.org/ Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

DBoW2: https://github.com/dorian3d/DBoW2 FABMAP: http://mrg.robots.ox.ac.uk/fabmap/ Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

These datasets include ground-truthed 6-DOF poses from Vicon andsynchronized IMU and images: EUROC MAV Dataset (forward-facing php?id kmavvisualinertialdatasets RPG-UZH dataset (downward-facing n.bagDavide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

SVO: Fast, Semi-Direct Visual Odometry[Forster, Pizzoli, Scaramuzza, ICRA’14]Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

SVO WorkflowDirect Frame-to-frame motionestimationFeature-based Frame-to-Keyframe poserefinementMapping Probabilistic depth estimationEdgeletof 3D points[Forster, Pizzoli, Scaramuzza, «SVO: Semi Direct Visual Odometry», ICRA’14]Corner

Probabilistic Depth EstimationDepth-Filter: Depth Filter for every feature Recursive Bayesian depth estimationMixture of Gaussian Uniform distribution𝜌𝜌𝜌𝑑𝑑𝜌𝑑[Forster, Pizzoli, Scaramuzza, SVO: Semi Direct Visual Odometry, IEEE ICRA’14]𝑑

Processing Times of SVOLaptop (Intel i7, 2.8 GHz)400 frames per secondEmbedded ARM Cortex-A9, 1.7 GHzUp to 70 frames per secondSource Code Open Source available at: github.com/uzh-rpg/rpg svo Works with and without ROS Closed-Source professional edition (SVO 2.0): available for companies

Accuracy and TimingEuroc 1RMS ErrorEuroc 2RMS ErrorTimingCPU @ 20 fpsSVO0.26 m0.65 m2.53 ms55 %SVO BA0.06 m0.07 m5.25 ms72 %ORB SLAM0.11 m0.19 m29.81 ms187 %LSD SLAM0.13 m0.43 m23.23 ms236 %Intel i7, 2.80 GHz

Integration on aQuadrotor Platform

Quadrotor SystemOdroid U3 Computer Quad Core Odroid (ARM Cortex A-9) used in Samsung Galaxy S4 phones Runs Linux Ubuntu and ROSPX4 (IMU)Global-Shutter Camera 752x480 pixels High dynamic range 90 fps450 grams!

Control StructureVisualizationon screen

Indoors and outdoors experimentshttps://www.youtube.com/watch?v 4X6Voft4Z 0RMS error: 5 mm, height: 1.5 m – Down-looking cameraSpeed: 4 m/s, height: 1.5 m – Down-looking camerahttps://www.youtube.com/watch?v 3mNY9-DSUDkFaessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3DMapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.

Robustness to Dynamic Objects and Occlusions Depth uncertainty is crucial for safety and robustness Outliers are caused by wrong data association (e.g., moving objects, distortions) Probabilistic depth estimation models outliershttps://www.youtube.com/watch?v LssgKdDz5z0Faessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3DMapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.

Robustness: Adaptiveness and Reconfigurability [ICRA’15]Automatic recovery from aggressive flight; fully onboard, single camera, no GPShttps://www.youtube.com/watch?v pGU1s6Y55JIFaessler, Fontana, Forster, Scaramuzza, Automatic Re-Initialization and Failure Recovery for Aggressive Flightwith a Monocular Vision-Based Quadrotor, ICRA’15. Demonstrated at ICRA’15 and featured on BBC News.

Autonomous Flight, Minimum-Snap, Speed: 4 m/s55

Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch 1980: First known VO real-time implementation on a robot by Hans Moraveck PhD thesis (NASA/JPL) for Mars rovers using one sliding camera (sliding stereo). 1980 to 2000: The VO research was dominated by NASA/JPL in preparation of 2004 Mars mission (see papers from Matthies,

Related Documents:

Visual-Inertial-Wheel Odometry with Online Calibration Woosik Lee, Kevin Eckenhoff, Yulin Yang, Patrick Geneva, and Guoquan Huang Abstract—In this paper, we introduce a novel visual-inertial-wheel odometry (VIWO) system for ground vehicles, which efficiently fuses multi-modal visual, inertial and 2D wheel

Specifically we apply deep learning to develop a novel tech-nique for laser-based odometry, the use of a laser scanner to estimate the change in a robot's position over time. The work presented in this paper provides a method that leverages the benefits of deep learning to estimate odometry directly from high fidelity sensors.

Camera motion estimation - Understand the camera as a sensor - What information in the image is particularly useful - Estimate camera 6(5)DoF using 2 images: Visual Odometry (VO) After all, it's what nature uses, too! Cellphone processor unit 1.7GHz quadcore ARM 10g Cellphone type camera, up to 16Mp (480MB/s @ 30Hz) "monocular vision"

for detecting 3D points, lines and planes, since our goal is to implement our approach on small mobile robots, such as drones. I. INTRODUCTION Odometry and SLAM are critical for robots to navigate in previously unseen environments. Although various solu-ti

Tutorial Process The AVID tutorial process has been divided into three partsÑ before the tutorial, during the tutorial and after the tutorial. These three parts provide a framework for the 10 steps that need to take place to create effective, rigorous and collaborative tutorials. Read and note the key components of each step of the tutorial .

Tutorial Process The AVID tutorial process has been divided into three partsÑ before the tutorial, during the tutorial and after the tutorial. These three parts provide a framework for the 10 steps that need to take place to create effective, rigorous and collaborative tutorials. Read and note the key components of each step of the tutorial .

Tutorial 1: Basic Concepts 10 Tutorial 1: Basic Concepts The goal of this tutorial is to provide you with a quick but successful experience creating and streaming a presentation using Wirecast. This tutorial requires that you open the tutorial document in Wirecast. To do this, select Create Document for Tutorial from the Help menu in Wirecast.

Tutorial 16: Urban Planning In this tutorial Introduction Urban Planning tools Zoning Masterplanning Download items Tutorial data Tutorial pdf This tutorial describes how CityEngine can be used for typical urban planning tasks. Introduction This tutorial describes how CityEngine can be used to work for typical urban .