Eigenboosting: Combining Discriminative And Generative .

2y ago
32 Views
2 Downloads
843.84 KB
8 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Giovanna Wyche
Transcription

Eigenboosting: Combining Discriminative and Generative Information Helmut GrabnerPeter M. RothHorst BischofGraz University of TechnologyInstitute for Computer Graphics and Vision{hgrabner, pmroth, bischof}@icg.tugraz.atAbstractA major shortcoming of discriminative recognition anddetection methods is their noise sensitivity, both duringtraining and recognition. This may lead to very sensitive and brittle recognition systems focusing on irrelevantinformation. This paper proposes a method that selectsgenerative and discriminative features. In particular, weboost classical Haar-like features and use the same featuresto approximate a generative model (i.e., eigenimages). Amodified error function for boosting ensures that only features are selected that show a good discrimination and reconstruction. This allows a robust feature selection usingboosting. Thus, we can handle problems where discriminant classifiers fail while still retaining the discriminativepower. Our experiments show that we can significantlyimprove the recognition performance when learning fromnoisy data. Moreover, the feature type used allows efficientrecognition and reconstruction.1. IntroductionWhen computing a classifier for object recognition onefaces two main philosophies: generative and discriminativemodels. Formally, the two categories can be described asfollows: Given an input x and and a label y then a generativeclassifier learns a model of the joint probability p(x, y) andclassifies using p(y x) which is obtained by using Bayes’rule. In contrast, a discriminative classifier models the posterior p(y x) directly from the data or learns a map frominput to labels: y f (x).Generative models such as principal component analysis(PCA) [11], independent component analysis (ICA) [10], This work has been supported by the Austrian Joint Research ProjectCognitive Vision under projects S9103-N04 and S9104-N04 and the FFGproject AUTOVISTA 813395 under the FIT-IT programme. Part of thiswork has also been carried out as part of the K-plus Competence centerADVANCED COMPUTER VISION funded under the K-plus program.In addition, this work has been sponsored by the EU FP6-507752 NoEMUSCLE IST project.Figure 1. Combining discriminative and generative information byusing a shared feature pool. In addition to discriminative classifying the features can be used to reconstruct a previously learnedobject (face) while the reconstruction for other objects (e.g., cars)fails completely.or non-negative matrix factorization (NMF) [13] try to finda suitable representation of the original data (by approximating the original data by keeping as much information aspossible). As they provide sufficient reconstructive powerthese methods are capable of dealing with missing or occluded pixels. Thus, several methods were proposed that allow robust recognition (e.g., [4, 14]) as well as robust learning (e.g., [5, 23]).In contrast, discriminant classifiers such as linear discriminant analysis (LDA) [3], support vector machines(SVM) [27], or boosting [7] were designed for classification tasks. Given the training data and the correspondinglabels the goal is to find optimal decision boundaries. Compared to generative methods this allows to train more specific classifiers having a higher recognition rate. In fact,several studies (e.g., [3, 17]) have shown that discriminativeclassifiers outperform generative models (if enough trainingdata is available). Thus, many applications use discriminative classifiers instead of generative classifiers. Comparedto generative models discriminative models have two maindrawbacks: (a) discriminant models are not robust, whether

in the training nor in the recognition stage and (b) a hugeamount of labeled training data is necessary.To overcome these drawbacks several approaches (e.g.,[6, 9, 15, 16, 19, 20]) have been proposed that combine discriminative and generative models to get the best of bothworlds: the discriminative power and the robustness. Inaddition, the paper [12] provides a theoretical discussionon this topic. Most of these approaches are based on twostages (e.g., [9, 15, 16, 19]). In the first stage a generativemodel is estimated and in the second stage a discriminantclassifier is built from the generative model. Holub et al. [9]proposed to use a probabilistic constellation model as generative model and to use SVM on Fisher Scores to estimatethe discriminative classifier. Lin et al. [16] applied a probabilistic PCA to compute a distribution for the positive samples in the first stage. In the second stage they estimate anew distribution for the negative samples by learning a linear projection. Other authors proposed to use a clusteringalgorithm in the generative stage and a neural network classifier (e.g., multi-layer perceptron [15], pairwise neural network [19]) in the discriminative stage. In contrast, the problem of robust discriminative classification was addressed byFidler et al. [6]. They proposed a robust LDA approachthat constructs a basis that contains all discriminative information but in addition also contains sufficient reconstructive information to enable robust reconstructions. BoostingHaar-like features allows to train efficient classifiers. Thus,Roth et al. [20] proposed the conservative learning framework where they additionally estimate a PCA model whichserves as supervisor for the boosted discriminative modeland thus ensures robustness.All methods described above combine discriminativeand generative information on image level only. Moreover,most of them are multistage methods where the (final) discriminative classifier is trained on a (pre-processed) generative model. The main contribution of this paper is to propose a method that combines a discriminative model and agenerative model on the feature level. In order to do thiswe need features that are discriminative and provide reconstructive feasibilities at the same time. For example, Haarlike features fulfill both criteria. It is well known [28] thatthis feature type can be used to train powerful discriminative classifiers. In addition, Tao et al. [25] showed thatbinary bases (in fact, Haar-like features can be consideredas binary basis functions) can be used to reconstruct grayvalue images. The discriminative and generative power ofHaar-like features is shown in Figure 1 by two specific examples. Given a face model learned from Haar-like featuresa face can be described and reconstructed by the same features which is not the case for a non-face image (e.g., a car).Having features that cover both, discriminative and generative information, we can combine discriminative andgenerative information on the feature level. In particular, weapply boosting for feature selection on Haar-like features.Therefore, we define a new error-function that additionallyincludes the generative information of a feature which allows a robust feature selection using boosting. Thus, wecan learn a discriminative classifier even from degraded input images.The remaining paper is organized as follows: Section 2reviews some theoretic issues that are relevant later on. Section 3 introduces the new Eigenboosting method and defines the modified discriminative/generative boosting errorfunction. In Section 4 the proposed method is evaluated ontwo well known databases. Finally, conclusions are drawnin Section 5.2. PreliminariesBefore we introduce the Eigenboost method we need todiscuss the two main components of the system and to introduce the notation.2.1. Boosting for Feature SelectionIn general, Boosting is a widely used technique in machine learning for improving the accuracy of any givenlearning algorithm. In fact, boosting converts a weak learning algorithm into a strong one. In the following we focuson the AdaBoost algorithm that was introduced by Freundand Schapire [7].Given a training set X {(x1 , y1 ), ., (xL , yL )}, wherexi IRP is a sample and yi { 1, 1} is the corresponding label, and an initial uniformly distributed weight distribution with p(xi ) L1 . In each boosting iteration n anew weak classifier1 hweak(x) : x [ 1, 1] and the corrensponding weight αn is calculated according to the trainingerror over all samples X and p(x). In addition, the probability p(x) is updated such that it is increased for samplesthat were misclassified and decreased for samples that wereclassified correctly. Thus, the algorithm focuses on the difficult examples.The process is repeated and at each boosting iteration anew weak classifier is added until a certain stopping condition is met (e.g., a given number of weak classifiers). Finally, a strong classifier hstrong (x) is computed as linearcombination of a set of N weak classifiers hweak(x):nÃhstrong(x) signNX!αn hweak(x)n.(1)n 1The weights αn can be obtained by1 A weak classifiers is a classifier that has to perform only slightly betterthan random guessing (i.e., for a binary decision task, the error rate mustbe less than 50%). It is obtained by applying a learning algorithm (e.g., byapplying statistical learning for a decision stump).

1αn ln2µ¶LX1 rnp(xl )hweak(xl )yl (2), rn n1 rnl 1which was shown by Schapire and Singer in [22] where theyproved strong bounds for the training and generalization error of AdaBoost. Hence, boosting minimizes the error onthe training set:LL 1 X strong1 X strongh(xl ) 6 yl h(xl )yl .LLl 1(3)l 1This error is bounded byà N!LNXY1Xexp αn hweak(xl )yl Zn ,nLn 1n 1(4)whereZn l 1x̃ KXa i ui .(7)i 1The coefficients ai can be calculated by a standard projectionPXai uTi x uji xj , i 1 . . . K,(8)j 1l 1LXThe columns of ui [u1i , . . . , uP i ]T IRP of U, i.e.,the eigenvectors, are arranged in decreasing order with respect to the corresponding eigenvalues. Usually, only K,K L, eigenvectors (those with the largest eigenvalues)are needed to represent a given image x to a sufficient degree of accuracy as a linear combination of eigenvectors ui :or, as a robust procedure [14], by solving a system of linearequationsKXxri aj uri ,j , i 1 . . . Q,(9)j 1p(xl )exp( αn hweak(xl )yl )n {z}(5)evaluated at Q K points r (r1 , . . . , rQ ).en(xl ) [ 1, 1]. Minimizing Zn on each roundand hweaknn, boosting greedily minimizes the training error which finally yields the weights defined in (2). Note, the optimization problem and therefore the weights αn are related to theerror of the weak hypotheses:(xl )yl .en hweakn(6)Boosting for feature selection was first introduced byTieu and Viola [26] and has been widely used for different applications (e.g., face detection [28]). The main idea isthat each feature corresponds to a single weak classifier andboosting selects an informative subset from these features.Given a set of possible features F {f1 , ., fM } and aweak learning algorithm L. In each iteration step n all features fj are evaluated on all positive and negative trainingsamples and a hypothesis is generated by applying the learning algorithm L. Finally, the best hypothesis is selectedwhich forms the weak classifier hweak. Thus, the selectionnof a feature depends the error en that was defined in (6) and(2), respectively.3. Eigenboosting3.1. Image RepresentationThe goal of this paper is to train a classifier that takesinto account discriminative as well as generative information. Therefore, we need a common low-level representation of the data that covers both aspects. In particular, wedecided to use Haar-like features for this purpose and defined a shared feature pool F {f1 , ., fM } containingHaar-like features. It is well known that boosting for feature selection on Haar-like features allows to train powerfuldiscriminative classifiers (e.g., [28]). Thus, the features fjare used to build weak hypothesis hweak(x) following thejtheory of boosting for feature selection as described before.But these features can also be considered as binary basisfunctions that describe a visual dictionary. Tao et al. [25]showed that binary bases can be used to reconstruct grayvalue images which can efficiently be done by using integral images [28]. Examples of such basis functions obtainedfrom Haar-like features are shown in Figure 2.2.2. Principal Component AnalysisGiven a set of input images X [x1 , . . . xL ] IRP L ,where xi [x1i , . . . , xP i ]T IRP is an individual image represented as a vector. Then the PCA projectionU [u1 , . . . uL ] IRP L is obtained by solving the eigenproblem for the covariance matrix C IRP P of X or moreefficiently by solving Singular Value Decomposition (SVD)of X assuming that X is mean normalized.Figure 2. Haar-like binary basis functions: for illustration the original values were normalized; a black pixel represents 1, a whitepixel 1, and a gray pixel 0.An extension of this approach called binary PCA (BPCA) was proposed by Tang and Tao [24]. The main idea

is first to approximate eigenimages from binary basis functions and then to use these eigenimages to reconstruct theoriginal image. This theory also holds for basis functionsdefined by Haar-like features which allows us to approximate an eigenimage image u byũ MXbi fi ,(10)i 1where fi is a single feature from the feature pool F. Thecoefficient bi can be estimated bybi F † ui ,(11)where F † denotes the pseudoinverse of F. In fact, the obtained bases are not orthogonal but for practical purposesthe accuracy is sufficient to reconstruct images by PCA projections; we can approximate the eigenbasis to a desired accuracy (increasing the number of features will increase theaccuracy of the reconstructions) from the over-complete visual dictionary.Having this eigenbasis we can reconstruct a given inputimage x by substituting (10) into (7) which yieldsx̃ KXk 1akMXbk,i fi .(12)i 1The coefficients ak can be estimated by the standard projection (8) or robustly by (9). Hence, we can easily introduce robustness into our approach which, in fact, is not possible for a standard discriminative approach.Figure 3. Eigenboosting framework for robust feature selection.learn kernel functions (KernelBoost and DistBoost) or byAvidan [2] to include spatial information.Discriminative ErrorThe discriminative classifier hdj (x) is defined byhdj (x) pj · sign(fj (x) θj ),where the threshold θj and the parity pj are defined byθj µ µ /2,pj sign(µ µ ).3.2. Discriminative/Generative ModelGiven a training set X of total L samples with Lpos positive samples and Lneg negative samples and the previouslydefined feature pool F. To finally train a discriminant classifier we combine a discriminative model and a generativemodel using a boosting framework. The proposed discriminative/generative learning framework is depicted in Figure 3.Discriminative classifiers hdj are trained using featuresfrom F from positive and negative training samples. Inparallel and independently using PCA an eigenbasis is estimated (from the positive samples only). The obtainedeigenvectors are approximated by (10) using features fromthe shared feature pool F. Finally, a discriminative classifier is trained by boosting for feature selection.To combine the discriminative and generative information we define a new error-function for boosting for featureselection. Therefore, for each feature fj we first have to estimate the discriminative error edj and the generative erroregj .Adapting the error function for boosting was previouslyaddressed by other authors, e.g., by Hertz et al. [8] to(13)(14)The mean values µ and µ are estimated by computingthe response for each feature fj for all images xl . Based onthe decision of the weak hypothesis hdj (xl ) the discriminative error edj for the feature fj on all training examples candirectly be estimated byL1X d(15)edj hj (xl )ylLl 1which is related to the error derived in (3).Generative errorTo estimate the generative error (from the positive samples only) we consider the error that would be obtainedwithout the feature fj . Letx̃j KXk 1akXbk,m fm(16)m6 jbe the reconstruction of the original image x using{F\{fj }} and x̃ be the reconstruction obtained by (12) (i.e.,

by using the full basis F). As x̃ is the optimal reconstructionthat can be obtained from the basis F (using a pre-specifiednumber of eigenimages K) we can consider x̃ x̃j as anerror measure for a single feature fj . From (12) and (16)we get KMKXX X X akbk,m fm akbk,m fm k 1 m 1(17) K k 1 K m6 j X X ak bk,j fj ak bk,j fj . k 1k 1Thus, the training error for a feature fj is related to K X ak bk,j . (18)k 1Hence, we can estimate the training error for a feature fjover all samples by Lpos K 1 X X (19)²j al,k bk,j . Lposl 1 k 1Finally, the generative error ²j is mapped into the interval[0, 1]. In particular, the normalized generative error egj canbe obtained by½egj g(²j ) 1 ²j θ0 otherwise ,(20)where θ is estimated from the expected reconstruction errorover all training samples.Modified boosting error-functionThe overall error ej is estimated as a weighted sum overthe discriminative error edj and the generative error egj :ej βedj (1 β)egj ,(21)where β [0, 1]. The main idea now is to use boostingnot on the standard error-function (6) but on this new combined error (21) that incorporates both, generative and thediscriminative information. Since we finally train a discriminative classifier we can define the error function fora sample(x, y) by e h(x)y. Thus, for the feature fj weget the weak classifierhweak(x) βhdj (x) (1 β)egj y.j(22)When substituting the error defined in (21) (or the definition of the weak classifier (22)) into (5) we finally get thatthe combined approach minimizesX ¡ p(x)exp α(βhdj (x) (1 β)egj ) X x X¡¡ ¡ p(x)exp α(βhdj (x)) · exp α(1 β)egj x X X¡ ¡p(x)exp αβhdj (x) . exp α(1 β)egj · {z} x X {z}generative priordiscriminative information(23)We can interpret this error function as follows: a generative prior (calculated with respect to the weighted positivesamples) influences the discriminative error in a multiplicative way, where the parameter β controls the influence. Forβ 1 the generative prior vanishes and we obtain the original boosting error function; for β 0 only the generativeprior is considered but no discriminative information is included. In fact, by using the prior we introduce robustnessto discriminant learning and enable robust feature selectionusing boosting.Please note, PCA is required only during training. Oncetraining is finished the boosted classifier can be used (e.g.,for object detection [28]) on its own and is therefore as efficient as any other boosting approach.4. ExperimentsFor our experiments we mainly used the ATT database(former ORL database) [21] and the UIUC Image Databasefor Car Detection [1]. The databases were split into a training and a test set. In particular, we used 60% of the imagesfor training and 40% for testing. The negative samples weregenerated randomly from images that do not contain the objects of interest. In the training we build a discriminativeclassifier containing 20 features; for the generative modelfive eigenimages were used.The main contribution of this paper is to introduce robustness to boosting based learning. Thus, we would liketo emphasize that the goal of the experiments is to showthe benefits of the presented approach (i.e., robust discriminative learning and reconstructing from discriminative features).4.1. Reconstructive powerFirst, we show the reconstructive power and the robustness of the Eigenboost method. Thus, we trained a facemodel from the ATT database by boosting and by the newEigenboost method and evaluated the obtained models ontest images (Figure 4(a)). The first two images are facesfrom the ATT database where we added an occlusion noise;the others are non-face images from the COIL data base[18].From Figure 4(b) it can be seen that the Haar-like basisfunction obtained by standard boosting can not be used to

(a)(b)Figure 5. ATT d

Combining discriminative and generative information by using a shared feature pool. In addition to discriminative classify- . to generative models discriminative models have two main drawbacks: (a) discriminant models are not robust, whether. in

Related Documents:

1 Generative vs Discriminative Generally, there are two wide classes of Machine Learning models: Generative Models and Discriminative Models. Discriminative models aim to come up with a \good separator". Generative Models aim to estimate densities to the training data. Generative Models ass

Structured Discriminative Models for Speech Recognition Combining Discriminative and Generative Models Test Data ϕ( , )O λ λ Compensation Adaptation/ Generative Discriminative HMM Canonical O λ Hypotheses λ Hypotheses Score Space Recognition O Hypotheses Final O Classifier Use generative

Combining information theoretic kernels with generative embeddings . images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative

For the discriminative models: 1. This framework largely improves the modeling capability of exist-ing discriminative models. Despite some recent efforts in combining discriminative models in the random fields model [13], discrimina-tive model

combining generative and discriminative learning methods. One active research topic in speech and language processing is how to learn generative models using discriminative learning approaches. For example, discriminative training (DT) of hidden Markov models (HMMs) fo

It is not difficult to imagine that combining the genera-tive and discriminative approaches could complement two methods. Recently, there have been several attempts to combine the generative and discriminative approaches. For instance, Holub and Perona [7] has developed Fisher ke

generative models to augment training data and enhance the invariance to input changes. The generative pipelines . code and combining with different structure codes, we can . work that is able to end-to-end integrate discriminative and generativ

Materials Science and Engineering, Mechanical Engineering, Production Engineering, Chemical Engineering, Textile Engineering, Nuclear Engineering, Electrical Engineering, Civil Engineering, other related Engineering discipline Energy Resources Engineering (ERE) The students’ academic background should be: Mechanical Power Engineering, Energy .