Combining Generative And Discriminative Representation .

2y ago
13 Views
2 Downloads
3.34 MB
11 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Lee Brooke
Transcription

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available ning Generative and DiscriminativeRepresentation Learning for Lung CT Analysis withConvolutional Restricted Boltzmann MachinesGijs van Tulder and Marleen de BruijneAbstract—The choice of features greatly influences the performance of a tissue classification system. Despite this, manysystems are built with standard, predefined filter banks thatare not optimized for that particular application. Representationlearning methods such as restricted Boltzmann machines mayoutperform these standard filter banks because they learn afeature description directly from the training data. Like manyother representation learning methods, restricted Boltzmannmachines are unsupervised and are trained with a generativelearning objective; this allows them to learn representationsfrom unlabeled data, but does not necessarily produce featuresthat are optimal for classification. In this paper we propose theconvolutional classification restricted Boltzmann machine, whichcombines a generative and a discriminative learning objective.This allows it to learn filters that are good both for describingthe training data and for classification. We present experimentswith feature learning for lung texture classification and airwaydetection in CT images. In both applications, a combinationof learning objectives outperformed purely discriminative orgenerative learning, increasing, for instance, the lung tissueclassification accuracy by 1 to 8 percentage points. This showsthat discriminative learning can help an otherwise unsupervisedfeature learner to learn filters that are optimized for classification.Index Terms—Representation learning, Restricted Boltzmannmachine, Deep learning, Machine learning, Segmentation, Patternrecognition and classification, Neural network, Lung, X-rayimaging and computed tomography.I. I NTRODUCTIONMost methods for automated image classification do notwork directly with image data, but first extract a higherlevel description of useful features from the image. Thechoice of features determines a large part of the classificationperformance. Which features work well depends on the natureof the classification problem: for example, some problemsrequire features that preserve and extract scale differences,whereas other problems require features that are invariant tothose properties. Often, feature representations are based onstandard filter banks of common feature descriptors, such asGaussian derivatives that detect edges in the image. Thesec 2016 IEEE. Personal use of this material is permitted.Copyright ⃝However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to pubs-permissions@ieee.org.This research is financed by the Netherlands Organization for ScientificResearch (NWO).G. van Tulder and M. de Bruijne are with the Biomedical Imaging Group,Erasmus MC, Rotterdam, The Netherlands. M. de Bruijne is also with theDepartment of Computer Science, University of Copenhagen, Denmark.Code used for the experiments is available as supplementary material andat ed filter banks are not specifically optimized for aparticular problem or dataset.As an alternative to such predefined feature sets, representation learning or feature learning methods [1] learn a highlevel representation directly from the training data. Becausethis representation is learned from the training data, it can beoptimized to give a better description of the data. Using thislearned representation as the input for a classification systemmight give a better classification performance than using ageneric set of features.Most feature learning methods use unsupervised modelsthat are trained with unlabeled data. While this can be anadvantage because it makes it easier to create a large trainingset, it can also lead to suboptimal results for classification,because the features that these methods learn are not necessarily useful to discriminate between classes. Unsupervisedfeature learning tends to learn features that model the strongestvariations in the data, while classifiers need features thatdiscriminate between classes. If the variation between samplesfrom the same class is much stronger than the variationbetween classes, feature learning probably produces featuresthat capture primarily within-class variation. If those featuresdo not represent enough between-class variation, they mightgive a lower classification performance.This issue of within-class variation is relevant for manyapplications, including medical image analysis. For example,in disease classification, the differences between patients areoften greater than the subtle differences between disease patterns. As a result, representation learners might learn featuresthat model these between-patient differences, rather than thosethat improve classification.In this paper we study the restricted Boltzmann machine(RBM), a popular representation learning model, as a wayto learn features that are optimized for classification. Thestandard RBM does not include labels and is trained with anunsupervised, generative learning objective. The classificationRBM [2], an extension of the standard RBM, does includelabel information and can also be trained with a discriminativelearning objective. This discriminative learning objective optimizes the classification performance of the classification RBM.The generative and discriminative objectives can be combinedto learn discriminative features that represent the data and areuseful for classification.We propose the convolutional classification RBM, whichcombines the classification RBM with the convolutional RBM,another extension of the standard RBM. The convolutionalCopyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TMI.2016.25266872RBM [3]–[6] uses the convolutional weight-sharing patternfrom convolutional networks to learn small filters that areapplied to every position in a larger image. This weight sharingmakes learning more efficient and allows the RBM to modelsmall features that occur in multiple areas of an image, whichis useful for describing textures.The ability to use both generative and discriminative learning objectives distinguishes the classification RBM from manyother representation learning methods. Unsupervised modelssuch as the standard RBM are usually trained with only agenerative learning objective. Supervised representation learning methods, such as convolutional neural networks [7], areusually trained with only a discriminative learning objective.The classification RBM can be trained with a generativeobjective, a discriminative objective, or a combination.We present experiments on lung tissue classification andairway detection. For the lung tissue classification experimentswe used a dataset on interstitial lung diseases (ILD) [8]with CT images of 73 patients. Previously published tissueclassification experiments on this dataset used wavelets [9]–[12], local binary patterns [13], [14], bag-of-visual-words [15],[16], filter banks derived from the discrete Fourier transform[17], RBMs [18], [19] and convolutional networks [20].We used RBMs to learn features for lung tissue classification. From the images, we first extracted 2D patches that weused to train RBMs with different mixtures of discriminativeand generative learning. Using the RBM-learned representations, we trained and evaluated classifiers that classify eachpatch in one of the five tissue classes. We compared thoseresults with those of two standard filter banks.We expected the effect of discriminative learning to becomeless important for larger representations (more hidden nodesin the RBM), because larger representations are more likely tocontain sufficient discriminative features even without explicitdiscriminative learning. To study this effect, we performedairway detection experiments on lung CT images from theDanish Lung Cancer Screening Trial (DLCST) [21]. We usednon-convolutional classification RBMs with different mixturesof discriminative and generative learning to learn featuresfor this dataset. The non-convolutional RBMs allowed us toexperiment with larger numbers of hidden nodes.This paper extends our earlier workshop paper [22] in whichwe introduced the convolutional classification RBM and foundthat using a mixture of generative and discriminative learningobjectives can produce features that improve classificationresults. In this paper, we present the results of more extensiveexperiments that confirm these preliminary conclusions.The rest of this paper is organized as follows. Section IIgives a brief overview of other relevant representation learningapproaches. Section III describes the RBM and its learningalgorithm. Section IV introduces the datasets and the experiments. Section V describes the results. We end with adiscussion and conclusion.II. R ELATED WORKRepresentation learning methods have been used for tissueclassification in lung CT before. In experiments similar tothose presented in this paper and using the same ILD dataset,Li et al. [18] used RBMs to extract features. Whereas we useclassification RBMs with convolution to learn small filters,Li et al. trained standard (non-convolutional) RBMs on smallsubpatches extracted from the patch that is to be classified. Inlater work [19] on the same dataset, Li et al. reported that convolutional neural networks gave a slightly better performancethan standard RBMs. Gao et al. [20] used convolutional neuralnetworks to classify full slices from the ILD dataset, withoutrequiring manually annotated ROIs. Schlegl et al. [23] alsoused convolutional neural networks to classify lung tissue ina different lung CT dataset.Convolutional neural networks have also been used in otherapplications of lung CT, such as the detection of lung nodulesand lymph nodes. In an early application of convolutional neural networks, Lo et al. [24], [25] trained a network to reject orconfirm potential lung nodules selected in a preprocessing step.More recently, Shen et al. [26] used multi-scale convolutionalnetworks to compute features for lung nodule classification.Kumar et al. [27] used multi-layer autoencoders to extractfeatures for the classification of lung nodules. Roth et al. [28]proposed a so-called 2.5D convolutional neural network thatsamples multiple 2D orthogonal views to detect lymph nodesin lung CT images.To our knowledge, classification RBMs have not beenapplied to lung CT images before, and there are only afew applications in other types of medical image analysis.Shin et al. [29] used classification RBMs to detect microcalcifications in digitized mammograms. Berry and Fasel [30]used translational deep Boltzmann machines, which are relatedto classification RBMs, to analyze ultrasound images of thetongue. Schmah et al. [31] analyzed fMRI data with RBMswith generative and discriminative learning.III. R ESTRICTED B OLTZMANN MACHINESA. Standard RBMThe restricted Boltzmann machine is a probabilistic neuralnetwork that learns the probability distribution of its inputs vand a hidden representation h. The visible nodes v representthe voxels of an input patch. To model the patches from ourlung CT images, we use Gaussian visible nodes v and binaryhidden nodes h (see [32] for a description of these node types).Each visible node vi has an undirected connection with weightWij R to each hidden node hj . The model also includes abias bi R for each visible node vi and a bias cj R foreach hidden node hj . Together, the weights and biases definethe energy function of the RBM:E (v, h) (vi bi )2j2σi2 vi Wij hj cj hj , (1)σii, jjwhere σi is the standard deviation of the Gaussian noise ofvisible node i. We normalize the training patches such thatσi 1. The joint distribution of the input v and hiddenrepresentation h is defined asP (v, h) exp ( E (v, h)),ZCopyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.(2)

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TMI.2016.25266873RBM. The posterior probability for classification ishidden hWP (y v ) (9)() exp dy j softplus (cj Ujy i Wij vi )(), expd softplus(c U Wv) yjy jij iyjiUinput vlabel yFig. 1. Schematic view of the classification RBM, which adds a set of labelnodes to the visible layer of the standard RBM. The label nodes are connectedto the input nodes through the hidden layer.where Z is a normalization constant. The conditional probabilities for the hidden nodes given the visible nodes and viceversa are P (hj v ) sigm(Wij vi cj ) and(3)iP (vi h ) N (vi Wij hj bi , σi2 ),(4)j1where sigm (x) 1 exp( x)is the logistic sigmoid function()2and N x µ, σ is a Gaussian probability density functionwith mean µ and variance σ 2 , evaluated at x.B. Classification RBMThe standard RBM is an unsupervised model. The classification RBM [2] extends the standard RBM by adding a setof label nodes to the visible layer (Figure 1). This allows theRBM to learn the joint probability of the input, the hiddenrepresentation, and the label. The label nodes use a one-hotcoding, where there is one node yk per class such that yk 1if the sample belongs to class k and yk 0 otherwise. Thelabel nodes have a bias dk R and are connected to thehidden nodes, with a connection with weight Ukj R betweenlabel node yk and hidden node hj . The energy function of aclassification RBM with Gaussian visible nodes isE (v, h, y) (vi bi )2 vi Wij hj cj hjσiji, jj yk Ukj hj dk yk .(5) 2σi2k, j kThe energy function defines the distributionP (v, h, y) exp ( E (v, h, y))Z(6)and the conditional probabilities P (hj v, y ) sigm(Wij vi Ukj yk cj ) and (7)iP (yk h ) sigm( kUkj hj ck ).(8)jThe visible nodes and the label nodes are not connected, sothe expression for P (vi h ) is unchanged from the standardwhere softplus (x) log (1 exp (x)). This definition onlyworks for RBMs with binary hidden nodes: it implicitly sumsover all possible states of the hidden layer, which can bedone efficiently if each hidden node can take one of only twovalues [2].C. Generating samples and classifying with RBMsRBMs are probabilistic models that define the activationprobability for each node given all other nodes. In practice,computing the probability of a particular state v, h is impossible, because the normalization constant or partition functionZ in the energy function is infeasible to compute for any butthe smallest models. However, since it is possible to computethe conditional probabilities, we can still use Gibbs samplingto sample from the model. Gibbs sampling alternately samplesfrom the hidden and visible layers. Given the visible and labelnodes, the new state of the hidden nodes can be sampledusing the distribution p (ht vt , yt ). Then, keeping the hiddennodes fixed, the new activation of the visible and label nodescan be sampled from p (vt , yt ht ). This can be repeated forseveral iterations, until the model converges to a stable state.For simplicity, we used a fixed number of iterations in ourexperiments.Classifying a patch using the classification RBM is morestraightforward. We input the patch values in the visible layerv and use Equation (9) to compute the posterior probabilityP (y v ) for each class. We assign the label of the class withthe highest posterior probability.D. Learning objectivesAt training time, the weights and biases of the standardRBM are chosen to optimize the generative learning objectivelog P (vt ), the probability distribution of each input image t.The classification RBM can be trained with the generativelearning objective log P (vt , yt ), which optimizes the jointprobability distribution of the input image and the label. Aclassification RBM can also be trained with the discriminativeobjective log P (yt vt ), which only optimizes the classification and does not try to optimize the likelihood of the inputimage. Larochelle et al. [2] suggest a hybrid objectiveβ log P (vt , yt ) (1 β) log P (yt vt ),(10)where β [0, 1] is the proportion of generative learning. Wewill use this objective with different values for β in our featurelearning experiments.The normalization constant or partition function Z makes itunfeasible to compute the gradient of the generative learningobjective. Instead, we use Gibbs sampling and contrastivedivergence [32] to estimate the stochastic gradient descentupdates for our RBMs. Contrastive divergence provides anCopyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available Mfeature maps h1 , h2 , . . . , hM···W1W2WMW1 , . . . , WMWMUlabels yinput image vinput image vFig. 2. Schematic view of the convolutional RBM, which uses a convolutionalweight-sharing arrangement to reduce the number of connection weights.Fig. 3. Schematic view of the convolutional classification RBM. Theconnection weights U are shared between all nodes in a feature map.efficient approximation for the gradient-based updates to theweights and biases.Classification RBMs are slightly more computationally expensive than unsupervised RBMs, because they use an additional discriminative learning objective and include extraweights to connect the label nodes. In practice however, wefind that the classification RBMs are not much slower than theunsupervised RBMs, because the additional complexity fromthe discriminative components is small compared with theother parts of the RBM. The number of labels and the numberof associated weights is usually much smaller than the numberof connections between the visible and hidden layers, andthe discriminative learning objective can be computed muchfaster than the generative objective, which requires contrastivedivergence and Gibbs sampling.Convolutional RBMs can produce unwanted border effectswhen reconstructing the visible layer, because the visiblenodes near the borders are only connected to a few hiddennodes. We pad our patches with voxels from neighboringpatches, and keep the padding voxels fixed during the iterations of Gibbs sampling.E. Convolutional RBMDesigned to model complete images instead of smallpatches, convolutional RBMs [3]–[6] use the weight-sharingapproach from convolutional neural networks. Unlike convolutional neural networks, convolutional RBMs are generativemodels and can be trained in the same way as standard RBMs.In a convolutional RBM, the connections share weights ina pattern that resembles convolution, with M convolutionalfilters Wm that connect hidden nodes arranged in M featuremaps hm (Figure 2). The connections between the visiblenodes and the hidden nodes in map m use the weightsfrom convolution filter Wm , such that each hidden node isconnected to the visible nodes in its receptive field. The visiblenodes share one bias b; all hidden nodes in map m sharethe bias cm . With the convolution operator we define theprobabilities()()P hm(11)ij v sigm (W̃m

Combining Generative and Discriminative . Most feature learning methods use unsupervised models that are trained with unlabeled data. While this can be an advantage because it makes it easier to create a large training . The generati

Related Documents:

1 Generative vs Discriminative Generally, there are two wide classes of Machine Learning models: Generative Models and Discriminative Models. Discriminative models aim to come up with a \good separator". Generative Models aim to estimate densities to the training data. Generative Models ass

Combining discriminative and generative information by using a shared feature pool. In addition to discriminative classify- . to generative models discriminative models have two main drawbacks: (a) discriminant models are not robust, whether. in

Structured Discriminative Models for Speech Recognition Combining Discriminative and Generative Models Test Data ϕ( , )O λ λ Compensation Adaptation/ Generative Discriminative HMM Canonical O λ Hypotheses λ Hypotheses Score Space Recognition O Hypotheses Final O Classifier Use generative

Combining information theoretic kernels with generative embeddings . images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative

For the discriminative models: 1. This framework largely improves the modeling capability of exist-ing discriminative models. Despite some recent efforts in combining discriminative models in the random fields model [13], discrimina-tive model

combining generative and discriminative learning methods. One active research topic in speech and language processing is how to learn generative models using discriminative learning approaches. For example, discriminative training (DT) of hidden Markov models (HMMs) fo

It is not difficult to imagine that combining the genera-tive and discriminative approaches could complement two methods. Recently, there have been several attempts to combine the generative and discriminative approaches. For instance, Holub and Perona [7] has developed Fisher ke

that biology can explain why some people are gay and others are not. Among books that make this argument, Balthazart’s is distinct for its focus on laboratory studies of animal sexuality. Brace yourself for descriptions of studies that analogize your most intimate moments with your partner to the choices made by caged rats and mice in the laboratory (and the occasional reference to a sheep .