292 Ieee Geoscience And Remote Sensing Letters, Vol. 15, No. 2 .

1y ago
4 Views
1 Downloads
3.26 MB
5 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Asher Boatman
Transcription

292 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 2, FEBRUARY 2018 Classification of Hyperspectral Imagery Using a New Fully Convolutional Neural Network Jiaojiao Li , Xi Zhao, Yunsong Li, Qian Du , Fellow, IEEE, Bobo Xi, and Jing Hu Abstract— With success of convolutional neural networks (CNNs) in computer vision, the CNN has attracted great attention in hyperspectral classification. Many deep learning-based algorithms have been focused on deep feature extraction for classification improvement. In this letter, a novel deep learning framework for hyperspectral classification based on a fully CNN is proposed. Through convolution, deconvolution, and pooling layers, the deep features of hyperspectral data are enhanced. After feature enhancement, the optimized extreme learning machine (ELM) is utilized for classification. The proposed framework outperforms the existing CNN and other traditional classification algorithms by including deconvolution layers and an optimized ELM. Experimental results demonstrate that it can achieve outstanding hyperspectral classification performance. Index Terms— Convolution, deconvolution, deep learning, extreme learning machine (ELM), feature enhancement, hyperspectral classification. I. I NTRODUCTION H YPERSPECTRAL sensors can capture hundreds of narrow spectral channels with very high spectral resolution. Hyperspectral imaging has many important applications, such as environment monitoring, medical diagnosis, military reconnaissance and target detection, and so on. It provides wealthy information to distinguish objects or physical materials [1]. In particular, it has great potential to provide finer classification map when classes have similar spectral features. Traditional classic classifiers, such as k-nearest neighbors, logistic regression, and support vector machines (SVMs), are employed in hyperspectral classification to achieve satisfactory performance. Sparse representation-based classification (SRC) is also utilized in hyperspectral classification with no need of any prior knowledge about the distribution of the data. To deal with the issue of Hughes effect in hyperspectral data, Manuscript received August 23, 2017; revised December 1, 2017; accepted December 18, 2017. Date of publication January 8, 2018; date of current version January 23, 2018. This work was supported in part by the National Nature Science Foundation of China under Grant 61571345, Grant 91538101, Grant 61501346, and Grant 61502367, in part by the 111 Project under Grant B08038, in part by the Fundamental Research Funds for the Central Universities under Grant JB170109, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2016JQ6023, and in part by the China Postdoctoral Science Foundation under General Financial Grant 2017M623124. (Corresponding author: Jiaojiao Li.) J. Li, X. Zhao, Y. Li, B. Xi, and J. Hu are with the State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710126, China (e-mail: jjli@xidian.edu.cn). Q. Du is with the Department of Electronic and Computer Engineering, Mississippi State University, Starkville, MS 39759 USA (e-mail: du@ece.msstate.edu). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2017.2786272 feature extraction (FE) and feature selection algorithms are employed to reduce redundant features from the original data. For instance, a principal component analysis (PCA) [2] and an independent component analysis [3] are typical spectralbased FE methods which projects original high-dimensional data into low-dimensional space by key features of the input data, which can be extracted by a certain criterion. In our previous work, we utilized the minimum estimated abundance covariance-based supervised band-selection algorithm based on the extreme learning machine (ELM) to further improve the hyperspectral classification performance [4]. Manifold learning [5] also has gained great interest in the extraction of the intrinsic feature effectively. In recent years, spatial feature has already drawn a significant amount of interest in addressing the curse of different spectra in the same class and similar spectra in different classes. Plenty of works [6]–[8] combine the spectral features and spatial features to improve hyperspectral classification. Kang et al. [9] utilize a guided filter to process the pixelwise classification map in each class by using the first PC or the first three PCs to capture major spatial features. Furthermore, texture features are also combined to improve the classification accuracy [10]. From these facts, it can be seen that multifeatures can improve hyperspectral classification. Deep learning with deep neural networks is commonly used to learn high-level features hierarchically. Typical deep neural network architectures include stacked autoencoders (AEs), deep belief networks, stacked denoising AEs, and convolutional neural networks (CNNs). Especially, due to its local receptive fields, the CNN plays a dominant role for processing the visual-based issues. A primary use of the CNN is classification. The CNN can be employed to extract spatial information effectively via the local connections. The motivation of CNNs was introduced in [11], then many its variants occurred in [12] and [13]. CNNs have been demonstrated that they can provide better performance than the SVM in different realizations. For instance, [14] modified the size of a convolutional kernel to accomplish pixel-to-label manner, and [6] promoted 2-D CNN and 3-D CNN to acquire deep robust and abstract features, and so on. However, without enough training samples, the traditional CNN faces an “overfitting” problem. Only employing CNN as a tool to extract deep features is not enough for achieving a better hyperspectral classification map. Being motivated by the procedure of superresolution, we investigate a novel fully CNN to enhance the hyperspectral features. Moreover, optimized ELM is used to classify hyperspectral images effectively. 1545-598X 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

LI et al.: CLASSIFICATION OF HYPERSPECTRAL IMAGERY USING A NEW FULLY CNN Fig. 1. Overall procedure of the proposed hyperspectral classification framework. Fig. 2. Overall architecture of the proposed fully CNN: the main architecture. To overcome the information loss during convolution, we employ deconvolution layers to enhance the hyperspectral deep features. Several main contributions of this letter are summarized in the following. 1) We develop a multilayer fully CNN, which is composed of convolution, pooling, and deconvolution layers and Rectified Liner units (ReLus). To the best of our knowledge, learning deconvolution network for hyperspectral feature enhancement is very meaningful, but no one has attempted to employ it in hyperspectral classification yet. 2) We utilize PCA algorithm to extract the first PC as the training labels. Then, the training data consist of hyperspectral data and the copies of the first PC. 3) We propose a novel hyperspectral classification framework based on the fully convolution neural network and optimized ELM. We believe that all these three contributions can help improve performance in hyperspectral classification. II. P ROPOSED F RAMEWORK Fig. 1 shows the overall procedure of our proposed hyperspectral classification framework. It consists of three steps: PCA decomposition, a fully CNN, and an optimized ELM. Through PCA algorithm, the first PC is extracted with detailed spatial features, especially edge features. Also, the first PC has the same width and length as the input image. In this way, due to carrying the refined spatial features, 293 the first PC can be utilized as the label image of the input data. Therefore, for further improving spatial information of the hyperspectral data, the training data of the fully CNN consist of the hyperspectral data cube and the copies of the first PC. The architecture of our proposed feature enhancement fully CNN (FEFCN) is demonstrated in Fig. 2. In Fig. 2, 48 48 and 24 24 demonstrate the size of the feature maps, meanwhile 1, 64, 64, 64, 64, 64, 32, and 1 are the numbers of the feature maps in the current layer. The FEFCN is composed of three components: a convolution layer, a pooling layer, and a deconvolution layer. It is the first work ever reported to employ the deconvolution layer to hyperspectral feature enhancement. After each convolution and deconvolution layer, it follows an ReLu as naturally enforcing sparsity for further increasing the learning speed. The convolution layers used as the feature extractor transform the input hyperspectral image data into multiscale features, which also lead the feature more abstract. It implies that deep convolution layers can extract more abstract and complex features at high layers progressively. And these abstract features are the invariants of the input image. However, the goal of our proposed network is to obtain the reconstructed maps, which have the same size as the input images. In this way, the deconvolution layer after the convolution and pooling layers is used to generate enlarged and dense features maps. Therefore, the deconvolution layer can increase the resolution of the output map. As shown in Fig. 3, the deconvolution layer can correlate a single input feature with multiple output features. The filters used in the

294 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 2, FEBRUARY 2018 Caffe is a kind of deep learning framework, which has been widely used using an expressive architecture and extensible code, and so on. Therefore, the hyperspectral image cube along with the first PC is used to train the network with the stochastic gradient descent implantation of the Caffe platform. In particular, the weight and bias matrices are updated via i 1 0.9 · i η · Fig. 3. m m i 1 i i 1 Illustration of deconvolution. deconvolutional layer correspond to the bases to reconstruct the input high-level features, and the output features from the deconvolutional layer are refined. Assume that the data matrix X [x1 , x2 , . . . , x L ] is the input hyperspectral image of the first layer and L is the number of spectral bands. w1 represent filters of the first layer acting on the input hyperspectral data and b1 are the biases for the first layer. θ1 (·) denotes the activation function. The first convolution layer is employed to extract the features of hyperspectral image data F1 (X) θ1 (u1 ) L (1) where u1 l 1 xl w1 b1 . xl denotes the lth band of the hyperspectral data. “ ” represents the convolution operation. w1 corresponds to n 1 filters with kernel size f 1 f 1 . b1 is an n 1 -dimensional vector, and each element of the vector is associated with each filters. The output is comprised of n 1 feature maps. Then, other convolutional layers are also introduced. Supposing that Fi (X) is the input of the (i 1)th layer with n i feature maps, then Fi 1 (X) can be calculated through Fi 1 (X) θi 1 (ui 1 ) (2) Loss( ) m i (5) where m {1, 2, · · · M} and i are the indices of the network layers and iterations, η denotes the learning rate, and ( Loss( ))/( m i ) is the derivative. After hyperspectral data reconstruction by the aforementioned FEFCN, then we utilize optimized ELM to classify the hyperspectral data. ELM is a fast and effective supervised learning algorithm with a single hidden layer feedforward neural network. Given training samples X1 and the desired outputs T1 , the training algorithm can be summarized as follows. Step 1: Select the input weight matrix W and bias matrix C randomly within [ 1, 1]. Compute the input of hidden layer Hin WX1 C. Step 2: Compute the output of hidden layer Hout g(Hin), where g(·) is a sigmoid activation function. T, where Hout denotes the Step 3: Calculate B Hout T T . pseudoinverse matrix of Hout , and Hout (HoutHout) 1 Hout For testing, the learned weights are directly applied to X2 . For each test pixel, its class label will be c if the cth output neuron yields the maximum output. The optimized ELM utilizes an estimated number of hidden neurons through an empirical linear relationship between the number of training samples and the number of hidden neurons, which can obtain superior performance at a very fast training speed. The detailed information is referred to [15]. where ui 1 L III. E XPERIMENTAL R ESULTS Fi (xl ) wi 1 bi 1 (3) l 1 and θi 1 (·) also means the activation function of the (i 1)th layer. wi 1 and bi 1 denote the filters and the bias vector of the (i 1)th convolution layer. The output of the (i 1)th layer contains n i 1 feature maps. From the description of the proposed fully CNN architecture, the estimation of the network parameters {wi , bi i (1, 2, · · · , M)} can be achieved through the minimization of the loss between the reconstructed images and the label images, and M is the number of layers in the proposed architecture. The loss function can be denoted via mean squared error Loss(x, ) L 1 2 F(xl , ) xFPC 2 L (4) l 1 where xFPC denotes the first PC of the hyperspectral data. Then, our model can produce an image reconstruction, which can be employed as the input to standard classifiers. In this section, we compare our proposed framework with other state-of-the-art classification algorithms on two real hyperspectral images. The first experimental data set was Indian Pines consists of 145 145 pixels with 200 spectral bands after removing 20 water absorption bands. The second data set to be used was the University of Pavia, which has 610 340 pixels with 104 bands after removing 12 noisy bands, and we utilized 10% labeled data of Indian Pines and 5% labeled data of Pavia University as the training samples to train the proposed network. Here, we also utilize the nonparametric McNemar test to evaluate the statistical significance in accuracy improvement with different algorithms. The McNemar’s test statistic for different algorithms can be calculated as [16] ( f 12 f 21 ) z f 12 f 21 (6) where f 12 denotes the number of samples misclassified via solution 2 but not solution 1, and f 21 means the number of samples misclassified via solution 1 but not solution 2. z is

LI et al.: CLASSIFICATION OF HYPERSPECTRAL IMAGERY USING A NEW FULLY CNN 295 TABLE I C LASSIFICATION A CCURACY (%) FOR I NDIAN P INES D ATA S ET VIA D IFFERENT C LASSIFICATION A LGORITHMS Fig. 4. Feature maps achieved by the deconvolution layers. (a) Indian Pines. (b) University of Pavia. TABLE II C LASSIFICATION A CCURACY (%) FOR PAVIA U NIVERSITY D ATA S ET VIA D IFFERENT C LASSIFICATION A LGORITHMS Fig. 5. Feature maps achieved by different classification algorithms in Indian Pines. (a) Groundtruth. (b) SRC-T. (c) ELM. (d) SVM-RBF. (e) CNN. (f) Proposed. the absolute value of z. For 5% level of significance, the z value is 1.96. If a z value is greater than this quantity, the two classification algorithms have significant discrepancy. In our proposed framework, the hyperspectral data sets are divided into several overlapping patches with 48 48 pixels, and the stride is set to 15. All these patches are the input data of the proposed architecture FEFCN. The momentum is empirically set to 0.9 to increase a faster converge of the network. The initial weights of each layer are selected randomly via Gaussian distribution with mean and standard deviation arranged as 0 and 0.001. Additionally, the standard backpropagation and stochastic gradient descent are utilized to optimize FEFCN. The base learning rate is 0.001. The sizes of kernels used in each layer are 7 7, 5 5, 2 2, 3 3, 2 2, 1 1, and 5 5, respectively. The numbers of kernels in each layer are 64, 64, 64, 64, 64, 32, and 1, respectively. The use of 1 1 kernel can make our deep model to learn more representative spatial features with fewer parameters. The network was implemented on the Caffe platform, with NVidia Tesla K80 GPU. Fig. 4 depicted the feature maps achieved via the deconvolution layer. It can be seen that the spatial features are salient and apparent, especially the edge features are distinct. Tables I and II show the overall accuracies and kappa values achieved by different classification algorithms. It can be seen that the proposed framework can achieve superior classification accuracy. Figs. 5 and 6 show the classification maps achieved via different classic classification algorithms: SRC with diagonal weight matrix T (SRC-T), the ELM, the SVM with radial basis function (SVM-RBF), the CNN proposed in [14], and our proposed framework. Apparently, the classification maps obtained by FEFCN are much smoother than others. Therefore, the feature-enhanced hyperspectral data via our proposed fully convolution network can extract deeper and more precise spectral and other spatial features. Table III tabulates the average z values when the proposed framework is against other classification algorithms.

296 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 2, FEBRUARY 2018 useful hyperspectral spatial features. Experimental results demonstrate that our proposed framework outperforms other traditional classifiers and deep learning-based algorithms. R EFERENCES Fig. 6. Feature maps achieved by different classification algorithms in the University of Pavia. (a) Groundtruth. (b) SRC-T. (c) ELM. (d) SVM-RBF. (e) CNN. (f) Proposed. TABLE III Z VALUES IN THE M CNEAR ’s T EST D ATA . T HE 5% L EVEL OF S IGNIFICANCE I S S ELECTED . (a) I NDIAN P INES . (b) PAVIA U NIVERSITY. Table III(a) and (b) depicted z values achieved from Indian Pines and Pavia University, respectively. A “yes” here indicates that the two methods in McNemar’s test have significant performance discrepancy. Obviously, the proposed framework is statistically different from their counterparts with 5% significance level. IV. C ONCLUSION In this letter, a novel hyperspectral classification framework is proposed, which employs convolution–deconvolution layers and an optimized ELM. The deconvolutional layer can generate enlarged and dense maps, which extracts refined highlevel features. The features close to target classes are amplified while noisy or background are suppressed effectively. Therefore, the reconstructed hyperspectral data consist of enhanced [1] J. Li, Q. Du, and Y. Li, “An efficient radial basis function neural network for hyperspectral remote sensing image classification,” Soft Comput., vol. 20, no. 12, pp. 4753–4759, Dec. 2016. [2] G. Licciardi, P. R. Marpu, J. Chanussot, and J. A. Benediktsson, “Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 3, pp. 447–451, May 2011. [3] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876, Dec. 2011. [4] J. Li, B. Kingsdorf, and Q. Du, “Band selection for hyperspectral image classification using extreme learning machine,” Proc. SPIE, vol. 10198, p. 101980R, May 2017. [5] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, Dec. 2000. [6] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 6232–6251, Oct. 2016. [7] X. Ma, H. Wang, and J. Geng, “Spectral–spatial classification of hyperspectral image based on deep auto-encoder,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4073–4085, Sep. 2016. [8] Y. Chen, X. Zhao, and X. Jia, “Spectral–spatial classification of hyperspectral data based on deep belief network,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392, Jun. 2015. [9] X. Kang, S. Li, and J. A. Benediktsson, “Spectral–spatial hyperspectral image classification with edge-preserving filtering,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 2666–2677, May 2014. [10] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based on structured sparse logistic regression and three-dimensional wavelet texture features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4, pp. 2276–2291, Apr. 2013. [11] K. Fukushima, “Neocognitron: A hierarchical neural network capable of visual pattern recognition,” Neural Netw., vol. 1, no. 2, pp. 119–130, 1988. [12] D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Flexible, high performance convolutional neural networks for image classification,” in Proc. 22nd Int. Joint Conf. Artif. Intell. (IJCAI), vol. 22. Jul. 2011, pp. 1237–1242. [13] H. Lee and H. Kwon, “Going deeper with contextual CNN for hyperspectral image classification,” IEEE Trans. Image Process., vol. 26, no. 10, pp. 4843–4855, Oct. 2017. [14] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional neural networks for hyperspectral image classification,” J. Sensors, vol. 2015, no. 2, Jul. 2015, Art. no. 258619. [15] J. Li, Q. Du, W. Li, and Y. Li, “Optimizing extreme learning machine for hyperspectral image classification,” J. Appl. Remote Sens., vol. 9, no. 1, p. 097296, Mar. 2015. [16] G. M. Foody, “Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy,” Photogramm. Eng. Remote Sens., vol. 70, no. 5, pp. 627–633, May 2004.

New Fully Convolutional Neural Network Jiaojiao Li , Xi Zhao, Yunsong Li, Qian Du , Fellow, . such as environment monitoring, medical diagnosis, military reconnaissance and target detection, and so on. It provides . 294 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 2, FEBRUARY 2018 Fig. 3. Illustration of deconvolution.

Related Documents:

IEEE 3 Park Avenue New York, NY 10016-5997 USA 28 December 2012 IEEE Power and Energy Society IEEE Std 81 -2012 (Revision of IEEE Std 81-1983) Authorized licensed use limited to: Australian National University. Downloaded on July 27,2018 at 14:57:43 UTC from IEEE Xplore. Restrictions apply.File Size: 2MBPage Count: 86Explore furtherIEEE 81-2012 - IEEE Guide for Measuring Earth Resistivity .standards.ieee.org81-2012 - IEEE Guide for Measuring Earth Resistivity .ieeexplore.ieee.orgAn Overview Of The IEEE Standard 81 Fall-Of-Potential .www.agiusa.com(PDF) IEEE Std 80-2000 IEEE Guide for Safety in AC .www.academia.eduTesting and Evaluation of Grounding . - IEEE Web Hostingwww.ewh.ieee.orgRecommended to you b

Year Make Model EngineLiters EngineCID 1955 Mercury Custom 4.8 292 1955 Mercury Montclair 4.8 292 1955 Mercury Monterey 4.8 292 1957 Ford Courier Sedan Delivery 5.8 352 1957 Ford Custom 4.8 292 1957 Ford Custom 300 4.8 292 1957 Ford Del Rio Wagon 4.8 292 1957 Ford F Series 5.8 352 1957 Ford Ranchero 5.8 352 1958 Ford Club 5.8 352 1958 Ford Country Sedan 4.8 292 1958 Ford Country Sedan 5.4 332 .

(DSM-5 pp 417-8) Code: 780.52 (G47.00) 1.Alcohol (291.82) 2.Caffeine (292.85) 3.Cannabis (292.85) 4. Opiods (292.85) 5. Sedative, hypnotic, or anxiolytic (292.85) 6.Amphetamine (or other stimulant) (292.85) 7. Tobacco (292.85) 8. Other (or unknown) Substance (292.5) Prominent/severe dist

2000 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 9, SEPTEMBER 2005 Cloud Statistics Measured With the Infrared Cloud Imager (ICI) Brentha Thurairajah, Member, IEEE, and Joseph A. Shaw, Senior Member, IEEE Abstract—The Infrared Cloud Imager (ICI) is a ground-base

898 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 4, APRIL 2005 Vertex Component Analysis: A Fast Algorithm to Unmix Hyperspectral Data José M. P. Nascimento, Student Member, IEEE, and José M. Bioucas Dias, Member, IEEE Abs

Standards IEEE 802.1D-2004 for Spanning Tree Protocol IEEE 802.1p for Class of Service IEEE 802.1Q for VLAN Tagging IEEE 802.1s for Multiple Spanning Tree Protocol IEEE 802.1w for Rapid Spanning Tree Protocol IEEE 802.1X for authentication IEEE 802.3 for 10BaseT IEEE 802.3ab for 1000BaseT(X) IEEE 802.3ad for Port Trunk with LACP IEEE 802.3u for .

Beijing, China 10-15 July 2016 IEEE Catalog Number: ISBN: CFP16IGA-POD 978-1-5090-3333-1 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2016)

Abrasive water jet machining Ultrasonic machining. Difference between grinding and milling The abrasive grains in the wheel are much smaller and more numerous than the teeth on a milling cutter. Cutting speeds in grinding are much higher than in milling. The abrasive grits in a grinding wheel are randomly oriented . A grinding wheel is self-sharpening. Particles on becoming dull either .