Online Learning For Human Classification In 3D LiDAR

2y ago
20 Views
2 Downloads
4.29 MB
8 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Kamden Hassan
Transcription

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)September 24–28, 2017, Vancouver, BC, CanadaOnline Learning for Human Classification in 3D LiDAR-based TrackingZhi Yan, Tom Duckett, Nicola Bellotto{zyan, tduckett, nbellotto}@lincoln.ac.ukLincoln Centre for Autonomous Systems, University of Lincoln, UKhttp://lcas.lincoln.ac.ukAbstract— Human detection and tracking are essential aspects to be considered in service robotics, as the robot often shares its workspace and interacts closely with humans.This paper presents an online learning framework for humanclassification in 3D LiDAR scans, taking advantage of robustmulti-target tracking to avoid the need for data annotation bya human expert. The system learns iteratively by retraininga classifier online with the samples collected by the robotover time. A novel aspect of our approach is that errors intraining data can be corrected using the information providedby the 3D LiDAR-based tracking. In order to do this, anefficient 3D cluster detector of potential human targets hasbeen implemented. We evaluate the framework using a new3D LiDAR dataset of people moving in a large indoor publicspace, which is made available to the research community. Theexperiments analyse the real-time performance of the clusterdetector and show that our online learned human classifiermatches and in some cases outperforms its offline version.Fig. 1. A screenshot of the 3D LiDAR-based tracking system in action withan online learned human classifier. The detected people are enclosed in greenbounding boxes. The colored lines are the people trajectories generated bythe tracker.I. INTRODUCTIONclassifier for human detection, which usually requires a largenumber of manually-annotated training data. Unfortunately,labelling this data is tedious work that is prone to humanerror. Such an approach is also infeasible when dealing withvery complex real-world scenarios and when the same systemneeds to be (re-)trained for different environments.In this paper, we develop an online learning frameworkto classify humans from 3D LiDAR detections, taking advantage of and extending our previously developed multitarget tracking system1 [10] to work with 3D LiDAR scans.We rely on the judgement of a false-positive and a falsenegative estimator, similarly to the Positive and Negative“experts” proposed in previous tracking-learning-detectiontechniques [11], but in this case to train online a classifierthat only looks for humans among the detections (i.e. theclassification performance does not influence the tracking).The contributions of this paper are three-fold. First, wepresent a computationally efficient clustering algorithm for3D LiDAR scans suitable for real-time model-free detectionand tracking. Then, we propose a framework for onlinelearning of a human classifier, which estimates the classifier’serrors and updates it to continually improve its performance.Finally, we provide a large dataset2 of partially-labeled 3DLiDAR point clouds to be used by the research communityfor training and comparison of human classifiers. This datasetcaptures new research challenges for indoor service robotsIn service robotics, detecting and tracking moving objectsis key to implementing useful and safe robot behaviors.Identifying which of the detected objects are humans isparticularly important for domestic and public environments.Typically the robot is required to collect environmental dataof the surrounding area using its on-board sensors, analysingwhere humans are and where they are going to. Humansshould be detected and tracked accurately and as early aspossible in order to have enough time to react accordingly.Unfortunately many service robots trade accuracy and robustness of the tracking with the actual coverage area ofthe detection (i.e. maximum range and field of view of thesensors), which is often limited to a few meters and smallangular intervals.3D LiDAR sensors have recently been applied to manyapplications in robotics and autonomous vehicles, eitheralone [1], [2], [3], [4], [5], [6], [7] or in combinationwith other sensors [8], [9], including human tracking. Animportant specification of this type of sensor is the ability toprovide long-range and wide-angle laser scans. In addition,3D LiDARs are usually very accurate and not affectedby lighting conditions. However, humans are difficult toidentify in 3D LiDAR scans because there are no lowlevel features such as texture and colour, and because of thelack of details when the person is far away from the robot.Detecting features in 3D scans can also be computationallyvery expensive, as the covered area grows with the rangeof the sensor, as does the number of human candidates.Moreover, previous methods mostly apply an offline learned978-1-5386-2682-5/17/ 31.00 2017 IEEE1 https://github.com/lcas/bayestracking2 ftware/l-cas-3d-point-cloud-people-dataset/864

including human groups, children, people with trolley, etc.The remainder of this paper is organized as follows.Section II gives an overview of the related literature, in particular about 3D LiDAR-based human detection and tracking.Then, we introduce our framework in Section III and thelink between tracking and online learning. The former ispresented in Section IV, including a detailed descriptionof the 3D cluster detection. The actual online learning isexplained in Section V, which clarifies the role of the P-Nexperts in the classification improved by tracking. Section VIpresents the experimental setup and results, as well as ournew 3D LiDAR dataset. Finally, conclusions and futureresearch are discussed in Section VII.and assist human detection in the next LiDAR scan. Teichman et al. [5] presented a semi-supervised learning methodfor track classification. Their method requires a large setof labeled background objects (i.e. no pedestrians) to trainsclassifiers offline, which showed good performances for trackclassification but not for object recognition. Our solution,instead, simultaneously learns human and background, anditeratively corrects classification errors online.Besides datasets collected with RGB-D cameras [12], [13],[16], there are a few 3D LiDAR datasets available to thescientific community for outdoor scenarios [4], [6], [9], [17],but not with annotated data for human tracking in largeindoor environments, like the one presented here.Some authors proposed annotation-free methods. Deuge etal. [6] introduced an unsupervised feature learning approachfor outdoor object classification by projecting 3D LiDARscans into 2D depth images. Dewan et al. [18] proposeda model-free approach for detecting and tracking dynamicobjects, which relies only on motion cues. These methods,however, are either not very accurate or unsuitable for slowand static pedestrians.It is clear that there remains a large gap between the stateof the art and what would be required for an annotationfree, high-reliability human classification implementationthat works with 3D LiDAR scans. Our work helps toclose this gap by demonstrating that human classificationperformance can be improved by combining tracking andonline learning with a mobile robot in highly dynamicenvironments.II. RELATED WORKHuman detection and tracking have been widely studied inrecent years. Many popular approaches are based on RGB-Dcameras [12], [13], although these have limited range andfield of view. 3D LiDARs can be an alternative, but one of themain challenges working with these sensors is the difficultyof recognizing humans using only the relatively low information they provide. A possible approach to detect humansis by clustering point clouds in depth images or 3D laserscans. For example, Rusu [14] presented a straightforwardbut computationally expensive method based on Euclideandistance. Bogoslavskyi and Stachniss [15] proposed a fasterapproach, although the computational efficiency limits theclustering precision. In our method, instead, both runtimeand precision are opportunely balanced.A very common approach is to use an offline trained classifier for human detection. For example, Navarro-Serment etal. [1], introduced seven features for human classification andtrained an SVM classifier based on these features. Kidonoet al. [3] proposed two additional features considering the3D human shape and the clothing material (i.e. using thereflected laser beam intensities), showing significant classification improvements. Li et al. [7] implemented insteada resampling algorithm in order to improve the qualityof the geometric features proposed by the former authors.Spinello et al. [4] combined a top-down classifier based onvolumetric features and a bottom-up detector, to reduce falsepositives for distant persons tracked in 3D LiDAR scans.Wang and Posner [8] applied a sliding window approachto 3D point data for object detection, including humans.They divided the space enclosed by a 3D bounding boxinto sparse feature grids, then trained a linear SVM classifierbased on six features related to the occupancy of the cells,the distribution of points within them, and the reflectance ofthese points. The problem with offline methods, though, isthat the classifier needs to be manually retrained every timefor new environments.The above solutions rely on pre-trained classifiers to detecthumans from the most recent LiDAR scan. Only a few methods have been proposed that use tracking to boost humandetection. Shackleton et al. [2], for example, employed anExtended Kalman Filter to estimate the position of a targetIII. GENERAL FRAMEWORKOur learning framework is based on four main components: a 3D LiDAR point cluster detector, a multi-targettracker, a human classifier and a sample generator (seeFig. 2). At each iteration, a 3D LiDAR scan (i.e. 3D pointcloud) is first segmented into clusters. The position andvelocity of these clusters are estimated in real-time by amulti-target tracking system, which outputs the trajectoriesof all the clusters. At the same time, a classifier identifiesthe type of cluster, i.e. human or non-human. At first, theclassifier has to be initialised by supervised training withlabeled clusters of human subjects. The initial training set canbe very small though (e.g. one sample), as more samples willbe incrementally added and used for retraining the classifierin future iterations.The classifier can make two types of errors: false positiveand false negative. Based on the estimation of the error typeby two independent “experts”, i.e. a positive P-expert anda negative one N-expert, which cross-check the output ofthe classifier with that of the tracker, the sample generatorproduces new training data for the next iteration. In particular, the P-expert converts false negatives into positivesamples, while the N-expert converts false positives intonegative samples. When there are enough new samples, theclassifier is re-trained. The process typically iterates untilconvergence (i.e. no more false positives and false negatives)or some other stopping criterion is reached.865

Fig. 3. Examples of a human (1.68 m high) detected by a 3D LiDAR atdifferent distances.Fig. 2.obtaining a subset P P . This is necessary in orderto remove from object clusters points that belong to thefloor, accepting the fact that small parts of the object bottomcould be removed as well. Note that this simple but efficientsolution works well only for relatively flat ground, which isone of the assumptions in our scenarios.Point clusters are then extracted from the point cloud P ,based on the Euclidean distance between points in 3D space.A cluster can be defined as follows:Process details of the online learning framework.Our system, however, differs from the previous work [11]in three key aspects, namely the frequency of the trainingprocess, the independence of the tracker from the classifier,and the implementation of the experts. In particular, ratherthan instance-incremental training (i.e. frame-by-frame training), our system relies on less frequent batch-incrementaltraining [19] (i.e. gathering samples in batches to trainclassifiers), collecting new data online as the robot moves inthe environment. Also, while the performance of the humanclassifier depends on the reliability of the P-N experts andthe tracker, the latter is independent from and completelyunaffected by the classification performance. Finally, theimplementation of our experts can deal with more than onetarget and therefore generate new training samples frommultiple detections, speeding up the online training process.Note that under particular conditions, the stability of ourtraining process is guaranteed as per [11]. The assumptionhere is that the number of correct samples generated bythe N-expert is greater than the number of errors of theP-expert, and conversely that the correct samples of the Pexpert outnumber the errors of the N-expert. Although in thereal world these assumptions are not always met, the stabilityof our system is simplified by the fact that we operate inenvironments where the vast majority of moving targets arehumans and occasional errors are corrected online.Cj P , j 1, . . . , Jwhere J is the total number of clusters. A condition to avoidoverlapping clusters is that they should not contain the samepoints [14], that is:Cj Ck , for j 6 k, if minkpj pk k2 d (3)where the sets of points pj , pk P belong to the point clusters Cj and Ck respectively, and d is a distance threshold.Accurate cluster extraction based on Euclidean distance ischallenging in practice. If the value of the distance thresholdd is too small, a single object could be split into multipleclusters. If too high, multiple objects could be merged intoone cluster. Moreover, in 3D LiDAR scans, the shape formedby laser beams irradiated on the human body can be verydifferent, depending on the distance of the person fromthe sensor (see Fig. 3). In particular, the vertical distancebetween points can vary a lot due to the vertical angularresolution, which is usually limited for this type of sensor.We therefore propose an adaptive method to determine d according to different scan ranges, that can be formulatedas:Θd 2 r tan(4)2where r is the scan range of the 3D LiDAR and Θ is thefixed vertical angular resolution. In practice, d is the verticaldistance between two adjacent laser scans. Obviously, thefarther the person from the sensor, the larger is the gapbetween the vertical laser beams, as depicted in Fig. 3. In thecase of our sensor, for example, the resolution is 2 (whilehorizontally the angular resolution is much smaller). Thismeans that to cluster points at a distance of, for example,9 m, the minimum threshold should be d 0.314 m.Clustering points in 3D space, however, can be computationally intensive. The computational load is proportionalto the desired coverage area: the longer the maximum range,IV. 3D LIDAR-BASED TRACKINGKey components of this system include the efficient 3DLiDAR point cluster detector and the robust multi-targettracker. This section provides details about both.A. Cluster DetectorThe input of this module is a 3D LiDAR scan, which isdefined as a set of I points:P {pi pi (xi , yi , zi ) R3 , i 1, . . . , I}(2)(1)The first step of the cluster detection is to remove theground plane by keeping only the points pi with zi zmin ,866

into account the 3D cluster size, which is left to futureextensions of our work.The estimation consists of two steps. In the first step, thefollowing 2D constant velocity model is used to predict thetarget state at time tk given the previous state at tk 1 : xk xk 1 t ẋk 1 ẋ ẋkk 1(6) yk yk 1 t ẏk 1 ẏk ẏk 1where x and y are the Cartesian coordinates of the target, ẋand ẏ the respective velocities, and t tk tk 1 . In thesecond step, if one or more new observations are availablefrom the cluster detector, the predicted states are updatedusing a 2D polar observation model:(θk tan 1 (yk /xk )p(7)γk x2k yk2Fig. 4.where θk and γk are, respectively, the bearing and thedistance of the cluster from the detector, extracted from theprojection on the (x, y) plane of the cluster’s centroid:1 Xcj pi(8) Cj Different values of d correspond to different nested regions.the higher the value of d , and therefore the number of pointclouds that can be considered as clusters. In addition, thelarger the area, the more likely it is that indeed new clusterswill appear within it. To face this challenge, we proposeto divide the space into nested circular regions centred atthe sensor (see Fig. 4), like wave fronts propagating from apoint source, where different distance thresholds are applied.In practice, we consider a set of values d i at fixed intervals d, where d i 1 d i d. For each of them, we computethe maximum cluster detection range ri using the inverseof Equation (4), and round them down to obtain the radiusRi bri c of the circular area. The area corresponding to d iis therefore the ring with width li Ri Ri 1 , where R0 isjust the centre of the sensor. Using d 0.1 m, we definerings 2-3 m wide, depending on the approximation, which isa good resolution to detect potential human clusters. In theexample considered above, a cluster at 9 m from the sensorwould belong to the 4th ring, where a threshold d 4 0.4 mwould be applied.Finally, a human-like volumetric model is used to filterout over- and under-segmented clusters:pi CjFor the sake of simplicity, in the above equations, noisesand transformations between robot and world frames ofreference are omitted. However, it is worth noting that,from our experience, the choice of the (non-linear) polarobservation model, rather than a simpler (linear) Cartesianmodel, as in [20], is important for the good performance oflong range tracking. This applies independently of the robotsensor used, as in virtually all of them, the resolution ofthe detection decreases with the distance of the target. Inparticular, the polar coordinates better represent the actualfunctioning of the LiDAR sensor, so its angular and rangenoises are more accurately modelled. This leads also to theUKF adoption, since it is known to perform better thanExtended Kalman Filters (EKF) in the case of non-linearmodels [10]. Finally, the NN data association takes care ofmultiple cluster observations in order to update, in parallel,multiple UKFs (i.e. one for each tracked target).V. ONLINE LEARNING FOR HUMANCLASSIFICATIONwhere wj , dj and hj represent, respectively, the width, depthand height (in meters) of the volume containing Cj .Online learning is performed iteratively by selecting andaccumulating a pre-defined number of new cluster sampleswhile the robot moves and/or tracks people, and re-training aclassifier using old and new samples. Details of the processare presented next.B. Multi-target TrackerA. Human ClassifierCluster tracking is performed using Unscented KalmanFilter (UKF) and Nearest Neighbour (NN) data associationmethods, which have already been proved to perform efficiently in previous systems [10], [16]. Tracking is performedin 2D, assuming people move on a plane, and without takingA Support Vector Machine (SVM) [21] is used for humanclassification, which is known to be effective in non-linearcases and has shown to work well experimentally in 3DLiDAR-based human detection [1], [3]. Six features with atotal of 61 dimensions are extracted from the clusters forC {Cj 0.2 wj 1.0,0.2 dj 1.0, 0.2 hj 2.0}(5)867

TABLE IF EATURES FOR HUMAN CLASSIFICATIONFeaturef1f2f3f4f5f6DescriptionNumber of points included in the clusterMinimum cluster’s distance from the sensor3D covariance matrix of the clusterNormalized moment of inertia tensorSlice feature for the clusterReflection intensity’s distribution (mean,standard dev. and normalized 1D histogram)Dimension11662027Fig. 5. Example of human-like trajectory samples, including one (redcrossed) filtered out b

ticular about 3D LiDAR-based human detection and tracking. Then, we introduce our framework in Section III and the link between tracking and online learning. The former is presented in Section IV, including a detailed description of the 3D cluster detection. The actual online learning is

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

och krav. Maskinerna skriver ut upp till fyra tum breda etiketter med direkt termoteknik och termotransferteknik och är lämpliga för en lång rad användningsområden på vertikala marknader. TD-seriens professionella etikettskrivare för . skrivbordet. Brothers nya avancerade 4-tums etikettskrivare för skrivbordet är effektiva och enkla att

Den kanadensiska språkvetaren Jim Cummins har visat i sin forskning från år 1979 att det kan ta 1 till 3 år för att lära sig ett vardagsspråk och mellan 5 till 7 år för att behärska ett akademiskt språk.4 Han införde två begrepp för att beskriva elevernas språkliga kompetens: BI

**Godkänd av MAN för upp till 120 000 km och Mercedes Benz, Volvo och Renault för upp till 100 000 km i enlighet med deras specifikationer. Faktiskt oljebyte beror på motortyp, körförhållanden, servicehistorik, OBD och bränslekvalitet. Se alltid tillverkarens instruktionsbok. Art.Nr. 159CAC Art.Nr. 159CAA Art.Nr. 159CAB Art.Nr. 217B1B