RGB-D Object Recognition Using Multi-Modal Deep Neural Network And DS .

1y ago
6 Views
1 Downloads
867.42 KB
19 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Konnor Frawley
Transcription

ArticleRGB-D Object Recognition Using Multi-Modal DeepNeural Network and DS Evidence TheoryHui Zeng 1,2,*, Bin Yang 1,2, Xiuqing Wang 3, Jiwei Liu 1,2, and Dongmei Fu 1,2School of Automation and Electrical Engineering, University of Science and Technology Beijing,Beijing 100083, China; g20178627@xs.ustb.edu.cn (B.Y.); liujiwei@ustb.edu.cn (J.L.);fdm ustb@ustb.edu.cn (D.F.)2 Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China3 Vocational & Technical Institute, Hebei Normal University, Shijiazhuang 050024, China;xqwang@hebtu.edu.cn* Correspondence: hzeng@ustb.edu.cn; Tel.: 86-135-2147-51311Received: 20 November 2018; Accepted: 24 January 2019; Published: 27 January 2019Abstract: With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-Dobject recognition has attracted more and more researchers’ attention in recent years. The deeplearning technique has become popular in the field of image analysis and has achievedcompetitive results. To make full use of the effective identification information in the RGB anddepth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer)evidence theory based RGB-D object recognition method. First, the RGB and depth images arepreprocessed and two convolutional neural networks are trained, respectively. Next, we performmulti-modal feature learning using the proposed quadruplet samples based objective function tofine-tune the network parameters. Then, two probability classification results are obtained usingtwo sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally,the DS evidence theory based decision fusion method is used for integrating the two classificationresults. Compared with other RGB-D object recognition methods, our proposed method adoptstwo fusion strategies: Multi-modal feature learning and DS decision fusion. Both thediscriminative information of each modality and the correlation information between the twomodalities are exploited. Extensive experimental results have validated the effectiveness of theproposed method.Keywords: RGB-D object recognition; deep neural network; multi-modal learning; DS evidence theory1. IntroductionObject recognition is one of the fundamental problems in the fields of computer vision androbotics. Until now, many methods have been proposed for object recognition, but most of them arebased on the RGB (Red Green Blue) image. However, the RGB image can only reflect the color,illumination, and texture information of the scene, and the depth information of the scene is lostduring the optical projection process from the 3D (Three Dimensional) space to the 2D (TwoDimensional) space. Therefore, RGB image based object recognition methods are susceptible toexternal factors, such as illumination and a complex background, which significantly impede theusage of the RGB image based object recognition methods in practical applications [1–5].In recent years, with the development of low-cost RGB-D (Red Green Blue-Depth) sensors,such as Microsoft Kinect and Intel RealSense, the RGB-D image has been widely used in sceneanalysis and understanding, video surveillance, intelligence robot, and medical diagnosis [6,7]. TheRGB-D sensor can capture the color image and the depth image at the same time. The RGB imagecontains color and appearance information, and the depth image contains the distance informationSensors 2019, 19, 529; doi:10.3390/s19030529www.mdpi.com/journal/sensors

Sensors 2019, 19, 5292 of 19between the RGB-D sensor and the object. Compared with RGB image, the RGB-D image canprovide additional information about the 3D geometry structure of the object, which has moreeffective information for object recognition. Furthermore, the depth image is robust to variations incolor and illumination. It has been proven that the RGB-D image based object recognition methodcan achieve better performance than the RGB image based object recognition method. So, theresearch of the RGB-D image based multi-modal object recognition method has attracted more andmore attention in the last few years [8–10].According to the types of the features, existing RGB-D image based object recognition methodscan be divided into two categories: Hand-crafted feature based methods and learned feature basedmethods. For the first category, the hand-crafted features, such as scale-invariant feature transform(SIFT) [11], speeded up robust features (SURF) [12], and spin images [13,14], are extracted todescribe the RGB and depth images, respectively, and then they are fed into the classifiers, such asSVMs (Support Vector Machines), for classification. The performance of this kind of method isinfluenced by the selected hand-crafted features. The hand-crafted features often need to bemanually tuned for different conditions, and they cannot capture all the useful discriminativeinformation of different classes of objects. For the second category, the features are learned from theRGB and depth images, and then the classifiers are used for classification. This kind of methodperforms better, but it still does not make full use of the effective information contained in theRGB-D images. Most existing methods usually learn separately from the RGB and depth images,and the two kinds of features are simply combined for recognition [15,16]. So how to make full useof the relationship of the RGB features and the depth features is still a key problem to be solved inthe field of RGB-D object recognition.The DS (Dempster Shafer) evidence theory is a useful uncertain reasoning method formulti-sensor information fusion [17,18]. It can be regarded as a generalization of the Bayes theory ofsubjective probability, which can grasp the uncertainty of the problem and performs better than thetraditional probability theory. The DS evidence theory has been successfully used in patternrecognition, expert system, fault diagnosis, and information fusion [19,20]. In this paper, two SVMclassifiers are used for RGB modality and depth modality, and we use the DS evidence theory tofuse the decisions of two classifiers. Compared with the weighted fusion method, the DS evidencetheory based decision fusion method considers the effects of different decisions for different classesby using the mass function, which can give more reasonable recognition results.In this paper, we focus on a multi-modal deep neural network and DS evidence theory basedRGB-D object recognition methods. First, the RGB and depth images are preprocessed and threechannel images of them are obtained as the inputs of each convolutional neural network (CNN).Second, the RGB CNN and the depth CNN are trained using the stochastic gradient descent (SGD)method with back-propagation. Then, the multi-modal feature learning network is trained tofine-tune the network parameters, where the objective function includes both the discriminativeterms and the correlation term. Finally, we construct two support vector machine (SVM) classifiersfor each modality, and the DS evidence theory is used to fuse the two decision results. Tosummarize, the contributions of this paper include: The CNN based multi-modal deep neural network is built for learning RGB features and depthfeatures. The training of the proposed multi-modal network has two stages. First, the RGBCNN and the depth CNN are trained, respectively. Then, the multi-modal feature learningnetwork is trained to fine-tune the network parameters;we propose a quadruplet samples based objective function for each modality, which can learnthe discriminative feature more effectively. Furthermore, we propose a comprehensivemulti-modal objective function, which includes two discriminative terms and one correlationterm; andfor each modality, an effective weighted trust degree is designed according to the probabilityoutputs of the two SVMs and the learned features. Then, the total trust degree can becomputed using the Dempster rule of combination for object recognition.

Sensors 2019, 19, 5293 of 19The rest of this paper is organized as follows. Section 2 provides a brief overview of the relatedwork. Section 3 introduces the proposed RGB-D based object recognition method in detail, includingRGB-D image preprocessing, the architecture, and the learning method of the proposedmulti-modal feature learning method and the DS evidence theory based RGB-D object recognitionmethod. Section 4 reports the experimental results and the detailed comparable analysis. Finally,conclusions are provided in Section 5.2. Related WorkRemarkable efforts have been investigated to explore RGB-D image based object recognition inrecent years. Earlier works mainly focus on hand-crafted feature based methods. For example, Laiet al. used spin images for depth images and SIFT descriptors for RGB images [21]. At first, the spinimages of the sampled 3D points were computed, and then the efficient match kernel (EMK)features were obtained to describe the entire shape. Then, the SIFT descriptors were extracted andtheir corresponding EMK features were computed. Additionally, the texton histograms were alsoextracted to capture texture information. Finally, three state-of-the-art classifiers, including thelinear support vector machine (LinSVM), Gaussian kernel support vector machine (kSVM), andrandom forest (RF), were used for recognition. Bo et al. proposed five depth kernel descriptors(gradient kernel, spin kernel, size kernel, kernel principal component analysis (PCA), and localbinary pattern kernel) to capture different recognition cues, including size, shape, and edges, whichcan significantly improve the performance of RGB-D object recognition [22]. Yu et al. proposed akind of continuous local descriptor called local flux feature (LFF), which can be used for both an RGBimage and depth image [23]. Then, the structure preserving projection (SPP) was used to fuse RGBinformation and depth information, and a novel binary local representation was obtained forRGB-D data. Logoglu et al. proposed two spatially enhanced local 3D descriptors: Histograms ofspatial concentric surflet-pairs (SPAIR) and colored SPAIR (CoSPAIR) [24]. The CoSPAIR descriptorcontains both shape information and color information, and it performs well in RGB-D objectrecognition. In summary, the hand-crafted features were designed according to part characteristicsof the objects, and they cannot satisfy the needs of the RGB-D object recognition of a large-scaledataset.Compared with the hand-crafted feature based methods, the learned feature based methodshave achieved better performance and have attracted more and more researchers’ attentions. Forexample, Bo et al. proposed a feature learning method for RGB-D based object recognition bymaking hierarchical matching pursuit (HMP) for color and depth images [25]. HMP uses sparsecoding to learn hierarchical feature representations from raw RGB-D data in an unsupervised way.Blum et al. proposed a new learned local feature descriptor for RGB-D images, called theconvolutional k-means descriptor [26]. It automatically learns feature responses in theneighborhood of detected interest points and is able to combine color information and depthinformation into one concise representation. Asif et al. proposed a bag-of-words (BOW) based featurelearning method for RGB-D object recognition [27]. The randomized clustering trees were used tolearn visual vocabularies, and the standard spatial pooling strategies were used for featurerepresentation. Huang et al. proposed a discriminative cross-domain dictionary learning basedRGB-D object recognition framework, which learns a domain adaptive dictionary pair and classifierparameters in the data representation level and classification level, respectively [28]. Li et al.proposed an effective multi-modal local receptive field extreme learning machine (MM-ELM-LRF)structure for RGB-D object recognition [29]. The extreme learning machine (ELM) was used as asupervised feature classifier for the final decision, and the proposed MM-ELM-LRF methodmaintains ELM’s advantages of training efficiency. In general, most of the above learned featurebased methods learn features from the color images and the depth images separately. Thus, thecorrelation information between the two modalities has not been fully exploited.Recently, deep learning has become extremely popular and has been successfully applied inRGB-D object recognition. Socher et al. proposed a model based on a combination of convolutionaland recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images

Sensors 2019, 19, 5294 of 19[30]. The CNN layer learns low level features, and the RNN layer composes higher order features.Wang et al. proposed a general CNN based multi-modal learning method for RGB-D objectrecognition, which can simultaneously learn transformation matrices for two modalities with alarge margin criterion and a maximal cross-modality correlation criterion [31]. Rahman et al.proposed a three-stream multi-modal CNNs based deep network architecture for RGB-D objectrecognition [32]. The three streams include surface normal, color jet, and RGB channel. Tang et al.proposed a canonical correlation analysis (CCA) based multi-view convolutional neural networksfor RGB-D object recognition, which can effectively identify the associations between differentperspectives of the same shaped model [33]. Zia et al. proposed a hybrid 2D/3D convolutionalneural network for RGB-D object recognition, which can be initialized with pretrained 2D CNN andcan be trained over a relatively small RGB-D dataset [34]. Bai et al. proposed a subset based deeplearning method for RGB-D object recognition [9]. At first, the raw dataset was divided into somesubsets according to their shapes and colors. Then, two sparse auto-encoders were trained for eachsubset, and the recursive neural network was used to learn robust hierarchical featurerepresentations. Finally, the learned features were sent to a softmax classifier for object recognition.Although the above methods have achieved good performance, learning effective discriminativeinformation from the RGB-D images is also a problem worthy of further research.Furthermore, there are many good performing methods in the RGB-D tracking literature,which are introduced to fuse color and depth channels. Song et al. proposed an RGB-D histogramof oriented gradients (HOG) feature based method for RGB-D tracking [35]. The RGB-D HOGfeatures can describe local textures as well as 3D shapes. Furthermore, they also proposed thesecond tracking method [35], which is based on the 3D point cloud. They designed the point cloudfeature to capture the color and shape of cells of 3D points. Both methods used sliding windowdetection with linear SVM, and the 2D optical flow method and 3D iterative closest point (ICP)method were adopted for point tracking. Meshgi et al. proposed an occlusion aware particle filtertracker based on RGB-D images [36], which employs a probabilistic model with a latent variablerepresenting an occlusion flag. The probabilistic framework accommodates the adaptive fusion ofthe features extracted from RGB and depth images. Camplani et al. proposed a real-time RGB-Dtracking with depth scaling kernelised correlation filters and occlusion handling [37]. They fusedcolor and depth cues as the tracker’s features by evaluating a variety of feature combinationstrategies.3. Proposed MethodAs shown in Figure 1, our proposed RGB-D object recognition method has three pipelines. Thered pipeline and the green pipeline are used for training the CNNs, and the green pipeline is usedfor testing. In the training stage, the RGB image and the depth image are first preprocessed toreduce noises, and they are rescaled to the normalized size. Next, we compute three channels of thedepth image using the HHA encoding method [38–42], where the HHA code refers to thehorizontal disparity, height above ground, and angle with gravity. The red pipeline is used to traineach CNN, respectively, and the trained network parameters of the two CNNs are used as theinitial parameters of the following multi-modal feature learning. Then, we use the green pipeline toperform multi-modal feature learning using the two CNNs. Through multi-modal learning, theparameters of the two CNNs can be optimized according to both the correlation informationbetween the two modalities and the discriminative information in each modality. In the testingstage, we use the blue pipeline for RGB-D object recognition. The optimized parameters of the twoCNNs are used for computing the RGB features and the depth features. After the learned RGB anddepth features of the testing sample have been computed, two SVM classifiers are used for eachmodality. Finally, the DS evidence theory is used to fuse the two recognition results. Figure 2 givesthe architectures of the two CNNs (RGB CNN and depth CNN), which are the same as the AlexNet[43]. Our experimental results have shown that both the proposed multi-modal feature learningstrategy and the DS fusion strategy can improve the recognition efficiency.

Sensors 2019, 19, 5295 of 19RGBDepthPreprocessingPreprocessingRGB CNN TrainingDepth CNNTrainingRGB FeatureMulti-modalDepth FeatureFeature LearningSVM1SVM2DS Decision FusionFigure 1. The flowchart of the proposed RGB-D object recognition method.

Sensors 2019, 19, 5296 of 19RGBCNNRGB4096 1cov1fc2Feature Learning4096 c2Depth CNNFigure 2. The architecture of the proposed multi-modal network.3.1. RGB-D Image PreprocessingTo meet the requirements of the two CNNs, which use the basic architecture of AlexNet, theinput RGB and depth images are first scaled to 227 227. The simplest way is to resize the images tothe required image size directly. However, as shown in Figure 3b–f, the direct method may deformthe object’s original ratio and geometric structure, which will influence the recognition performance.So, we used the scaling processing method proposed in [33]. At first, we resized the origin image sothat the length of its long side becomes 227 pixels. Then, we expanded the resized image along theshort side to obtain a square image. The two sides of image expansion should be equal and theresized image should be located in the middle of the expansion scaled image. The expansion of theimages is done by adding black pixels. Figure 3g–i shows the scaled images. Form Figure 3, we cansee that compared with the resized images, the scaled images can effectively preserve the shapeinformation of the objects.For the scaled RGB image, we can obtain its R channel image, G channel image, and B channelimage as three input images of the RGB CNN. For the scaled depth image, we first fill out its holesand reduce noise using the median filters. Then, the HHA encoding method is used to obtain threeinput images of the depth CNN. The HHA representation can encode the properties of thegeocentric pose that emphasize complementary discontinued in the image, and has beensuccessfully used in several RGB-D image based works [38–42].

Sensors 2019, 19, 5297 of 19(a)(b)(c)(d)(e)(f)(g)(h)(i)Figure 3. The results of image scaling. (a) The RGB and depth images from the “ceteal box” class;(b) the RGB and depth images from the “flashlight” class; (c) the RGB and depth images from the“cap” class; (d) the resized images of (a); (e) the resized images of (b); (f) the resized images of (c);(g) the scaled images of (a); (h) the scaled images of (b); (i) the scaled images of (c).3.2. Feature Learning Method of the Proposed Multi-Modal Network3.2.1. The Architecture of the Proposed Multi-Modal NetworkThe proposed multi-modal network is designed to extract the features of the RGB and depthimages. Figure 2 illustrates the architecture of the proposed multi-modal network, which consists oftwo branches. Each branch is a CNN with the same architecture as the AlexNet [43]. The inputs ofthe first branch are the three channels of the RGB images, and the inputs of the second branch arethe HHA encoding results of the depth images. The AlexNet consists of five convolutional layersand three fully-connected layers with a final 1000-way softmax. It has about 60 million parametersand 650,000 neurons. In this paper, we only used five convolutional layers and the first twofully-connected layers. The first convolutional layer, the second convolutional layer, and the fifthconvolutional layer are followed by max-pooling layers. The activation function of all convolutionallayers and fully-connected layers is the rectified linear unit (ReLU). The last fully-connected layer isdeleted and the second fully-connected layer is used for feature extraction. The training of theproposed network has two stages. In the first stage, the RGB feature and the depth feature arelearned separately using their corresponding CNNs. In the second stage, the multi-modal networkis fine-tuned using the RGB and depth images. Both the discriminative information of eachmodality and the correlation information between two modalities are considered in theoptimization process.3.2.2. RGB Feature Learning and Depth Feature LearningIn this paper, we first learn the RGB features and the depth features, respectively. Here, weelaborate on the objective function for learning the RGB features, and likewise for learning thedepth features. Inspired by the deep quadruplet network proposed for learning a local featuredescriptor, we designed a novel quadruplet samples based objective function. Compared with thetriplet objective function, the quadruplet objective function is less prone to over-fitting and has a

Sensors 2019, 19, 5298 of 19better training efficiency [44,45]. The difference between the existing deep quadruplet network andour work is that the deep quadruplet network has four branches. We have not adopted thefour-branch network structure and only used the concept of quadruplet in the objective function.The aim of our proposed quadruplet objective function is to minimize intra-class distances andmaximize inter-class distances.Assume xi , x j , xk , xl is a sample quadruplet, which is obtained using the sampling methodproposed in Ref. [39]. Among them, the sample, xi and x j , are from the same class, and they arecalled a positive sample pair. The sample, xk and xl , are from different classes, and they are calleda negative sample pair. The positive set, , contains a number of positive sample pairs, and thenegative set, , contains a number of negative sample pairs. For the input sample, x , let f1 ( x) beits output of the second fully-connected layer of the RGB CNN, which is the learned RGB feature.Then, the quadruplet objective function can be defined as:min F1 1 ( i , j ) 2 h f1 ( xi ) f1 ( x j ) T1 12 ( k ,l ) h T1 1 f1 ( xk ) f1 ( xl )22 (1)where h is a hinge loss function, h( x) max(0, x) , and 1 is the weight. From Equation (1), we canconclude that the proposed quadruplet objective function can make the distance between the(x , x )positive sample pair, i j , smaller than a given threshold, T1 , and it also can make the distancebetween the negative sample pair, ( xk , xl ) , larger than a given threshold, T1 1 . In summary, ourproposed objective function encourages that the distances between the same-class samples to besmaller by at least the margin, 1 , than the distances between different-class samples.In this paper, the RGB CNN is initialized using transfer learning. The initial parameters areobtained from pretrained AlexNet on the ImageNet large scale dataset. Then, we fine-tuned theRGB CNN, which is trained by the SGD method with back-propagation, and the derivatives of theloss function, F1 , with respect to 1 f1 ( xi ) , f1 ( x j ) , f1 ( xk ) , and f1 ( xl ) can be derived as: F1 h T1 1 f1 ( xk ) f1 ( xl ) 1 ( k ,l ) 22 (2) 2 F1 2 f1 ( xi ) f1 ( x j ) h ' f1 ( xi ) f1 ( x j ) T12 f1 ( xi )j2 F1 2 f1 ( xi ) f1 ( x j ) h ' f1 ( xi ) f1 ( x j ) T12 f1 ( x j )i(3) F1 2 1 f1 ( xk ) f1 ( xl ) h ' T1 1 f1 ( xk ) f1 ( xl ) f1 ( xk )l F1 2 1 f1 ( xk ) f1 ( xl ) h ' T1 1 f1 ( xk ) f1 ( xl ) f1 ( xl )k22(4)22 (5) (6)In the optimization process, the weight, 1 , can be updated using Equation (2) and theback-propagation is conducted using Equations (3)–(6).For the depth CNN, it is initialized using the same initialization method as the RGB CNN. Letf 2 ( x) be the output of the second fully-connected layer of the sample, x , which is the learned depthfeature. Similar to the definition of the objective function of the RGB CNN, the objective function ofthe depth CNN can be defined as:min F2 2 ( i , j ) 2 h f 2 ( xi ) f 2 ( x j ) T2 22 ( k ,l ) h T2 2 f 2 ( xk ) f 2 ( xl )22 (7)

Sensors 2019, 19, 5299 of 19where 2 is the weight, T2 and 2 are the given threshold. Then, the derivatives of the objective F2 F2 F2 F2 F2,,,, f 2 ( x j ) f 2 ( xk ) f 2 ( xl )function, F2 , can be derived, and the expressions of 2 f 2 ( xi )aresimilar to Equations (2)–(6). Finally, the depth CNN can be optimized using the SGD method with backpropagation.3.2.3. Multi-Modal Feature LearningAs the RGB image and depth image of the same object have some implicit relations, weexploited the correlation information of the two modalities to extract more effective features.Inspired by the processing method proposed in Ref. [31], we used the distances between differentmodalities to construct the correlation term of the objective function. The aim of our proposedcorrelation term is to maximize the inter-modality relationship of intra-class samples and minimizethe inter-modality relationship of inter-class samples. That is to say, we should minimize thedistances between the RGB feature and the depth feature of the same class and maximize thedistances between the RGB feature and the depth feature of different classes. So, the correlationterm can be defined as:Fc f ( x ) f ( x ) 2 f ( x ) f ( x ) 2 f1 ( xk ) f 2 ( xl ) 2 f 2 ( xk ) f1 ( xl ) 2 1i2j 22i1j 2 22 ( k ,l ) ( i , j ) (8)where is the weight to adjust the influences of the inter-class samples and the intra-classsamples. The derivatives of the correlation term can be derived as follows: Fc22 f1 ( xk ) f 2 ( xl ) 2 f 2 ( xk ) f1 ( xl ) 2 ( k , l ) (9) Fc 2 f1 ( xi ) f 2 ( x j ) f1 ( xi )j(10) Fc 2 f 2 ( xi ) f1 ( x j ) f 2 ( xi )j(11) Fc Fc Fc,, Fc Fc f1 ( xk ) f 2 ( xk ) f1 ( xl ),,Similar to Equations (10) and (11), the expressions of, f1 ( x j ) f 2 ( x j )and Fc f 2 ( xl )also can be derived.Finally, we used the discriminative terms of each modality and the correlation term betweenthe two modalities to construct the multi-modal objective function. The objective function, F1 , canbe used as the discriminative term of the RGB modality, and the objective function, F2 , can be usedas the discriminative term of the depth modality. The multi-modal objective function can beexpressed as:min{ 1 , 2 , , 1 , 2 }F 1p F1 2p F2 Fc ( )subject to 1 2 1, 1 0, 2 0, 0, p 1(12)where 1 and 2 are the weights between the RGB modality and the depth modality, and isthe weight between the discriminative terms and the correlation term. The parameter, p , is therelaxation factor, which can make the discriminative term of both modalities effective [31]. AssumeF1and F2 are kept constant, if F1 is more than F2 , then the solutions of 1 and 2 are: 1 0

Sensors 2019, 19, 52910 of 19and 2 1 , which means that only the depth modality is effective; if F1 is less than F2 , then thesolutions of 1 and 2 are: 1 1 and 2 0 , which means that only the RGB modality iseffective. For the above two conditions, only one modality is effective and the correlationinformation of the two modalities cannot be exploited, so it may fall into the local optimum. Byusing the relaxation factor, p , the objective function becomes nonlinear with respect to 1 and 2 ,and each modality will give a contribution in the optimization process. The Lagrange function canbe constructed as follows:L( , ) 1p F1 2p F2 Fc ( 1 2 1)(13) L( , ) L( , ) By settingandto 0, k can be updated as: 1/ Fk 1/( p 1) k 1/ F1 1/( p 1) 1/ F2 1/( p 1),k 1, 2(14)In our fusion network, the discriminative terms and the correlation term of the two modalitiesare back-propagated to the two CNNs. Given the optimized k , the back-propagation can beconducted using the following derivatives of F with respect to f1 ( xi ) and f 2 ( xi ) : F Fc (15) F Fc (16) Fc F1 F 1p f1 ( xi ) f1 ( xi ) f1 ( xi )(17) Fc F2 F 2p f 2 ( xi ) f 2 ( xi ) f 2 ( xi )(18)The learning steps of the proposed multi-modal neural network can be listed as follows:1.Initialize the RGB CNN and the depth CNN with parameters from the AlexNet, which hasbeen pre-trained on the ImageNet large scale dataset.2.Train the RGB CNN and the depth CNN, respectively, using the SGD method withback-propagation.For the RGB CNN,(1) Update 1 according to Equation (2).(2) Update the parameters in the RGB CNN according to Equations (3)–(6).(3) Repeat (1)–(2) until convergence or the maximum number of iterations is reached.Likewise, for the depth CNN. The parameter, 2 , and the parameters in the RGB CNN areupdated in turn.3.Train the fusion network using the SGD method with back propagation.(1) Update k according to Equation (14).(2) Update according to Equations (8) and (15).(3) Update according to Equations (9) and (16).(4) Update the parameters in the two CNNs according to Equations (17) and (18).Repeat (1)–(2) until convergence or the maximum number of iterations is reached.3.3. RGB-D Object Recognition Based on DS Evidence Theory

Sensors 2019, 19, 52911 of 19In this paper

such as Microsoft Kinect and Intel RealSense, the RGB-D image has been widely used in scene analysis and understanding, video surveillance, intelligence robot, and medical diagnosis [6,7]. The RGB-D sensor can capture the color image and the depth image at the same time. The RGB image

Related Documents:

Connect the SATA power cable on the power cable set to the SATA power cable from the power supply. . from the pump to an available USB 2.0 internal connector on the motherboard. N. 24 3.9 CONNECTING RGB LED FOR LIGHTING CONTROL KRAKEN X RGB SERIES Check the orientation and connect compatible NZXT RGB devices to the RGB LED connector on the .

The Adobe RGB (1998) color image encoding is defined by Adobe Systems to meet the demands for an RGB working space suited for print production. This document has been developed in response to industry needs for a specification of the Adobe RGB (1998) color image encoding. With the Adobe RGB (1998) color image encoding, users

Object built-in type, 9 Object constructor, 32 Object.create() method, 70 Object.defineProperties() method, 43–44 Object.defineProperty() method, 39–41, 52 Object.freeze() method, 47, 61 Object.getOwnPropertyDescriptor() method, 44 Object.getPrototypeOf() method, 55 Object.isExtensible() method, 45, 46 Object.isFrozen() method, 47 Object.isSealed() method, 46

Object Class: Independent Protection Layer Object: Safety Instrumented Function SIF-101 Compressor S/D Object: SIF-129 Tower feed S/D Event Data Diagnostics Bypasses Failures Incidences Activations Object Oriented - Functional Safety Object: PSV-134 Tower Object: LT-101 Object Class: Device Object: XS-145 Object: XV-137 Object: PSV-134 Object .

A color model is an abstract numerical system for describing color, using three or four main numbers to represent primary colors. RGB and CMYK are the two main color models that you need to be familiar with. RGB color model The RGB color model is an additive color model, used for all me

What is object storage? How does object storage vs file system compare? When should object storage be used? This short paper looks at the technical side of why object storage is often a better building block for storage platforms than file systems are. www.object-matrix.com info@object-matrix.com 44(0)2920 382 308 What is Object Storage?

Recommendation ITU-R BT.601 (standard-definition television (SDTV)) SMPTE standard 240M (precursor to Rec. 709) Recommendation ITU-R BT.709 (high-definition television (HDTV)) –Image sRGB Adobe RGB Wide gamut RGB (or Adobe Wide Gamut RGB) ProPhoto RGB (or reference output medium me

Zrunners-repeaters-strangers-aliens [ (RRSA) (Parnaby, 1988; Aitken et al., 2003). This model segments inputs of demand from customers (in this case, the requests from researchers for data cleared for publication) and uses the different characteristics of those segments to develop optimal operational responses. Using this framework, we contrast how the rules-based and principles-based .