Hybrid Generative-Discriminative Visual Categorization.

2y ago

28 Views

2 Downloads

2.95 MB

39 Pages

Last View : 10d ago

Last Download : 3m ago

Upload by : Kaleb Stephen

Report this link

Download PDF

Transcription

Hybrid Generative-Discriminative VisualCategorization.Alex D. Holub1 , Max Welling2 , Pietro Perona112Computation and Neural SystemsDepartment of Computer ScienceCalifornia Institute of Technology, MC 136-93University of California IrvinePasadena, CA 91125Irvine, CA eduAbstractLearning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning andclassification have, in principle, superior performance, generative approachesprovide many useful features, one of which is the ability to naturally establishexplicit correspondence between model components and scene features – this, inturn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach, using ‘Fisher Kernels’ [12], which retains most of the desirable properties of generative methods,while increasing the classification performance through a discriminative setting.Our experiments, conducted on a number of popular benchmarks, show strongperformance improvements over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended toaddress several outstanding challenges within computer vision including how tocombine multiple object models and learning with unlabelled data.1

1IntroductionDetecting and classifying objects and object categories in images is currently oneof the most interesting, useful, and difficult challenges for machine vision. Muchprogress has been made during the past decade in formulating models that capturethe visual and geometrical statistics of natural objects, in designing algorithmsthat can quickly match these models to images, and in developing learning techniques that can estimate these models from training images with limited supervision [1, 26, 30, 8, 16, 5, 24, 15, 11]. However, our best algorithms are not close tomatching human abilities. Machine vision systems are at least two orders of magnitude worse than humans in several aspects including the number of categoriesthat can be learned and recognized, the classification error rates, the classificationspeed, and the ease and flexibility with which new categories can be learned.This work is motivated by the challenge of learning to recognize categoriesthat look similar to one another. A number of methods have shown good performance on dissimilar categories (for example airplanes, automobiles, spotted cats,faces and motorcycles as in [8, 5]). None of these methods has been shown to perform well on visual categories which look similar to one another such as bicyclesand motorcycles or male and female faces. For example, while the ‘constellationmodel’ [8] has error rates of a few percent on dissimilar categories such as facesvs. airplanes and cars vs. cats, it has error rates around 30% if it is asked to recognize faces of different people (see the x axis of the plots in Fig. 1). Why doesthis discrepancy exist? As we shall see, one potential confound is the underlyinggenerative learning algorithm.Learning and classification methods fall into two broad categories (see Figure 2). Let y be the label of the class and x the measured data associated with thatclass. A generative approach will estimate the joint probability density functionp(x, y) (or, equivalently, p(x y) and p(y)) and will classify using p(y x) whichis obtained using Bayes’ rule. Conversely, discriminative approaches will estimate p(y x) (or, alternatively, a classification function y f (x)) directly fromthe data. It has been argued that the discriminative approach results in superiorperformance, i.e. why bother learning the details for models of different classesif we can directly learn a criteria for discriminating between the classes [27]? Indeed, it was shown that the asymptotic (in the number of training examples) errorof discriminative methods is lower than for generative ones when using simplelearning models [17].Yet, among machine vision researchers generative models remain popular [26,30, 8, 5, 15, 20]. There are at least five good reasons why generative approaches2

Hybrid Model (Combining Models)Performance mum Likelihood PerformancePerformance ComparisonHybrid Model (Prior Knowledge) 5 Training Examples1009590858075706565707580859095100Maximum Likelihood PerformanceFigure 1: A pure generative Maximum Likelihood (ML) approach will not work wellwhen categories are similar in appearance (right column images of faces, each row showsa different person), especially when few training examples are available (scatterplots onthe left, x axis). We apply discriminative techniques from Jaakkola et al. [12] to transformgenerative approaches for visual recognition into discriminative classifiers which retainsome of the desirable properties of generative models and yield better performance. (Top)ML in comparison to using combinations of hybrid models. See Section 6 below fordetails. Category labels for these faces, from top to bottom: P1, P2, and P3. These newface categories will be posted on the web. (Bottom) ML in comparison to hybrid modelsin a semi-supervised learning paradigm in which few examples (in this case three trainingexamples) are present. See Section 5 below for details.3

xtestGenerativeTrainp(x y1).Class. Np(y1 xtest).maxytestp(yN xtest)p(x yN)xtest TestTrainDiscriminativeTestClass 1Class 1. NClassSVMClassifierytestxtest TestTrainHybridClass 1p(x y1). NClass.FisherScoresSVMClassifierytestp(x yN)Figure 2: Schematic comparison of the generative (top), discriminative (middle), andhybrid (bottom) approaches to learning discussed in this paper. While generative modelsare a natural choice for visual recognition, discriminative models have been shown togive better performance in different domains. The hybrid model captures many desirableproperties of both.4

are an attractive choice for visual recognition. First, generative models naturallyincorporate information about occlusion and missing features. This is becausegenerative methods allow one to establish explicit ‘correspondence’ between partsof the model and features in the image. For every such mapping, the parts in themodel corresponding to the missing features can simply be marginalized out of theprobabilistic model, leaving us with a lower dimensional model over the observedparts [30]. Second, collecting training examples is expensive in vision and training sets come at a premium. Ng and Jordan [17] demonstrated both analyticallyand experimentally that in a 2-class setting the generative approach often has better performance for small numbers of training examples, despite the asymptoticperformance being worse. Third, it has been shown that prior knowledge can beuseful when few training examples are available, and that prior information maybe easily incorporated in a generative model [6]. Fourth, we ultimately envisionsystems which can learn thousands of categories; in this regime it is unlikely thatwe will be able to learn discriminative classifiers by considering simultaneouslyall the training data. It is therefore highly desirable to design classifiers that canlearn one category at a time: this is easy in the generative setting and difficult inthe discriminative setting where training data for all categories must be available atonce for a decision boundary to be calculated. Fifth, it is unclear, in general, whatfeatures to use when training a discriminative classifier on object categories. Consider that many popular algorithms for object recognition rely on feature detectorsto find ‘interesting’ regions within an image. Each image thus is represented as anunordered set of feature detections of variable length. How can these unorderedlists be used by a discriminative classifier?Is it possible to get the best of both worlds and develop approaches with theflexibility of generative learning and the performance of discriminative methods?Jaakkola and Haussler have shown that a generative model can be used in a discriminative context by extracting Fisher Scores from the generative model andconverting them into a ‘Fisher Kernel’ [12] (see Figure 2). A kernel representsthe data as a matrix of pairwise similarities which may be used for classification by a kernel method, such as the support vector machine (SVM). The field ofkernel methods is well developed [27, 23, 21] and represents the state-of-the-artin discriminative learning. Here, we explore how to apply these ideas to visualrecognition.We calculate Fisher Kernels that are applicable to visual recognition of objectcategories and explore experimentally the properties of such ‘hybrid models’ ona number of popular and challenging data-sets. Other kernel-based approacheshave been suggested for object recognition, including Vasconcelos et al. [28] who5

exploit a similar paradigm, using a Kullback-Leibler based kernel and test onthe COIL data-set. Wallraven et al. [29] utilize a clever kernel which implicitlycompares detected features in different images, but apply their method to differentsets of images than those used in this paper.In Section 2 we briefly review one class of generative models, commonlycalled the ‘Constellation Model’, which will be used in the rest of the paper.In Section 3 we show how to transform a generative Constellation Model intoa discriminative setting by utilizing the idea of Fisher Kernels. In Section 4 wecompare the performance of hybrid and generative constellation models. In Section 5 we explore how these hybrid models can be extended and effectively usedin circumstances where we have a mixture of labelled and unlabelled data, i.e.‘semi-supervised’ learning. Finally, in Section 6 and 7 we show how the hybridframework can be used to optimally combine several generative models (for example generative models based on different feature detectors and different numbersof parts) into a single classifier. Section 8 discusses the main results and observations of this work.2Generative ModelsIn this section we briefly review a class of generative models which will be usedin conjunction with the discriminative methods described in the next section. Inprinciple any generative model that is differentiable with respect to its parameterscan be used. We chose to experiment with the ‘Constellation Model’ which wasfirst proposed by Burl et al. [1]. Weber et al. [30] showed that this model maybe learned from cluttered images in a weakly supervised setting in which only aclass label is associated with each image using maximum likelihood. Fergus etal. [8] extended the model by making it scale-invariant and incorporating generalpurpose feature detectors. We use a simplified version of Fergus’ constellationmodel in which we do not explicitly model occlusion or relative scale.2.1The Constellation ModelThe constellation model is a generative framework which constructs probabilistic models of object classes by representing the appearance and relative positionof several object parts [1, 30, 8]. Given a suitable training set composed of images containing examples of objects belonging to a given category, such modelsare trained by finding a set of model parameters θMLE which maximizes the log6

likelihood of the model [30, 8]. Both appearance and shape are modelled as jointlyGaussian and θ {θa , θs } represent the mean and diagonal variance parametersof the shape (θs ) and appearance models (θa ). To remove dependence on location, the x-y coordinates of the parts are measured relative to a reference part,e.g. the left-most part. In our implementation, as suggested by Fergus et al. [8],appearance is represented by the first 20 PCA components of normalized 11 11pixel patches which were cut out around feature detections in training images atthe scale indicated by the detectors (see next subsection). The number of interestpoint detections considered in an image is a design parameter.For each training image Ii we obtain a set of F interest points and their appearance descriptors. We would like to establish correspondence, i.e. assign aunique interest point to every model part, or component, Mj . Burl et al [1] showedthat since we do not a priori know which interest point belongs to which modelcomponent, we need to introduce a ‘hidden’ hypothesis variable h which mapsinterest points to model parts. We order the interest points in ascending order ofx-position. Note that although we model only the diagonal components of theGaussian, the model parts are not independent as we enforce that each part ismapped to¡ a unique feature, implicitly introducing dependencies. The result isFa total of Mhypotheses, where each h assigns a unique interest point to eachmodel part. We marginalize over the hypothesis variable to obtain the followingexpression for the log likelihood for a particular class:!ÃXXXp(Ai , h θa )p(Xi , h θs )(1)log (p(Ii )) logiihwhere {Xi } are the relative coordinates of the object and represent the shape information while {Ai } are the PCA components described above and represent theappearance information. We assume that the shape and appearance models are independent of one another given a hypothesis h and that the images are I.I.D. Thisstep is key to maximum-likelihood model learning, and to classification, once amodel is available (see details in [1, 30, 8]). We note that exploring all possiblehypotheses carries a combinatorial computational cost which severely limits thenumber of parts and interest points which can be used in the model.For clarity we consider the makeup of a typical set of parameters θ. Considerone part of a 3-part model. A single part consists of parameters specifying itsshape and appearance, θs and θa respectively. The shape of the part is specifiedby a two dimensional mean and two dimensional variance (we consider diagonalcovariance matrices in our model) indicating the mean and variance of the position7

Figure 3: Examples of scaled features found by the KB (left) and multi-scale DoG (right)detectors on images from the ’persons’ data-set located at http://www.emt.tugraz.at/opelt/. Approximately the top 50 most salient detections are shown for both.of the part. Each part thus has a four dimensional parameter array specifying itslocation. Now consider the appearance parameters of a part. The appearanceof a part is specified by the mean and variance of the PCA components for thatparticular part. A 20 dimensional PCA representation thus consists of a total of40 parameters, 20 for the mean and 20 for the variance of the part.2.2Interest-Point DetectionThe constellation model requires the detection of interest points within an image.Numerous algorithms exist for extracting and representing these interest points.We considered several popular interest point detectors: the entropy based Kadirand Brady (KB) [14] detector, the multi-scale Difference of Gaussian (DoG) detector [3], the multi-scale hessian detector (mHes), and the multi-scale Harris detector(mHar). Figure 3 shows typical interest points found within images. Alldetectors indicate the saliency of interest points, and only the most salient interestpoints are used. The KB interest point detector was used in all experiments belowunless otherwise specifically noted.2.3Generative Model LearningWe train our generative constellation models using the EM algorithm [4] as computed explicitly for the constellation model by Weber et al. [30]. The algorithminvolves iteratively calculating the expected values of the unobserved variables of8

the model and then maximizing the parameters. The algorithm was terminatedafter 50 iterations or after the log likelihood stopped increasing. A typical 3-partmodel optimized on 100 images with 25 detections in each image took on the order of 20 minutes to optimize using a combination of C (mex) and Matlab codeon a 2Ghz machine.3Fisher Scores and Fisher KernelsFor supervised learning, such as regression and classification, kernel methods areoften the method of choice. As argued in the introduction, our interest is in combining generative models with a discriminative step for the purpose of visual object recognition. We chose support vector machines (SVM) [27] as our kernelmachine. The SVM (like all kernel methods) process the data in the form of akernel matrix (or Gram matrix), a symmetric and positive definite n n matrix ofsimilarities between all samples. A simple way to construct a valid kernel matrixis by defining a set of features, φ(xi ), and to define the kernel matrix as,Ki,j K(xi , xj ) φT (xi )φ(xj )(2)The kernel represents the similarities between samples: relatively large kernel entries correspond to two samples which are similar while small (possibly negative)entries correspond to dissimilar samples. Kernels defined by inner products suchas the one in Equation 2 produce positive-definite kernel matrices [27].The generative model will have its impact on the classifier through the definition of these features. We will follow [12] in using ‘Fisher Scores’ as our features.Given a generative probabilistic model the ‘Fisher Scores’ φ(xi ) are defined asφ(xi ) log p(xi θ MLE ) θ(3)where θ MLE is the maximum likelihood estimate of the parameters θ. By definition, θ MLE is obtained by maximizing the likelihood. A necessary condition is thatthe gradient of such likelihood (or log-likelihood) is zero, which is equivalent to‘balancing’ the Fisher Scores,XX log p(xi θ MLE ) φ(xi ) 0(4) θiiHence, samples “pull” on the parameter values through their Fisher Scores whichcan be interpreted as “forces”. At the MLE all forces balance. Two data-items that9

exert similar ‘forces’ on all parameters have their feature vectors aligned resultingin a larger positive entry in the kernel matrix.Since it is not a priori evident that the data can be separated using a hyperplanein this feature space, it can be beneficial to increase the flexibility of the separatingsurface (making sure that the problem is properly regularized) as shown in [27].This is achieved by applying non-linear kernels such as the RBF kernel or thepolynomial kernel in this new feature space, i.e. K(φ(xi ), φ(xj )) with,µ¶12KRBF (xi , xj ) exp 2 φ(xi ) φ(xj ) (5)2σKPOLp (xi , xj ) (R φ(xi )T φ(xj ))p(6)Where σ represents the variance of the RBF kernel and p represents the degree ofthe polynomial kernel being used.To remove scale differences between the features we normalized the featuresbefore we computed their inner product,φa (xi ) q1Nφa (xi )PN 2j 1 φa (xj )(7)Why do we bother going through a two-stage process where we first traingenerative models for each object category and then train another classifier basedon a kernel derived from those models, where we could also classify using loglikelihood ratios? The intuitive answer to this question is that a classifier is trainedto find an optimal decision boundary, i.e. it focusses its attention to what is relevant to the task. Here, the samples which are close to the decision boundarycarry much more information than the ones away from the boundary. In contrast,classifying according to likelihood ratios simply derives the decision boundary asa by-product from fitting models for every category. The objective of this fittingprocedure is to maximize the probability of all samples for every category andnot deriving a good decision boundary for the classification task at hand. Thisintuition has been made more precise in numerous papers. Most relevant to FisherKernels is the theorem in [12] stating that asymptotically (in the large data-limit)a classifier based on the Fisher Kernel can be shown to be at least as good (andtypically better) as the corresponding naive Bayesian procedure (i.e. likelihoodratios or maximizing p(y x)). Similar results have been obtained in e.g. [17] and[25]. It should be mentioned that for small numbers of sampl

ple generative models based on different feature detectors and different numbers of parts) into a single classiﬁer. Section 8 discusses the main results and observa-tions of this work. 2 Generative Models In this section we brieﬂy review a class of generative models which will be used in conjunction with

Related Documents:

Generative Models: Gaussian Discriminative Analysis and …

1 Generative vs Discriminative Generally, there are two wide classes of Machine Learning models: Generative Models and Discriminative Models. Discriminative models aim to come up with a \good separator". Generative Models aim to estimate densities to the training data. Generative Models ass

39 Views

2y ago

Eigenboosting: Combining Discriminative and Generative ...

Combining discriminative and generative information by using a shared feature pool. In addition to discriminative classify- . to generative models discriminative models have two main drawbacks: (a) discriminant models are not robust, whether. in

32 Views

2y ago

Structured Discriminative Models for Speech Recognition

Structured Discriminative Models for Speech Recognition Combining Discriminative and Generative Models Test Data ϕ( , )O λ λ Compensation Adaptation/ Generative Discriminative HMM Canonical O λ Hypotheses λ Hypotheses Score Space Recognition O Hypotheses Final O Classifier Use generative

26 Views

2y ago

Hybrid Discriminative-Generative Approach with Gaussian ...

2 Discriminative Models 2.1 Overview From a probabilistic perspective, a discriminative model (or regression model ) represents a conditional . Generative models (or joint models ) consist of mod- . to the shared challeng

32 Views

2y ago

A Hybrid Discriminative/Generative Approach for Modeling ...

Feature Selection and Discriminative Activity Models Earlier work has shown that discriminative methods often outperform generative models in classification tasks [Ng and Jordan, 2002]. Additionally, techniques such as bagging and boosting that combine a set of weak classifiers

24 Views

2y ago

A Generative-Discriminative Hybrid Method for Multi-View ...

It is not difﬁcult to imagine that combining the genera-tive and discriminative approaches could complement two methods. Recently, there have been several attempts to combine the generative and discriminative approaches. For instance, Holub and Perona [7] has developed Fisher ke

22 Views

2y ago

Combining information theoretic kernels with generative ...

Combining information theoretic kernels with generative embeddings . images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative

33 Views

2y ago

Jeff Sutherland Gabrielle Benefield

Agile Software Development with Scrum Jeff Sutherland Gabrielle Benefield. Agenda Introduction Overview of Methodologies Exercise; empirical learning Agile Manifesto Agile Values History of Scrum Exercise: The offsite customer Scrum 101 Scrum Overview Roles and responsibilities Scrum team Product Owner ScrumMaster. Agenda Scrum In-depth The Sprint Sprint Planning Exercise: Sprint Planning .

117 Views

3y ago

Recent Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

McKinsey on Finance

finance and strategy 23 How M&A practitioners enable their success Perspectives on Corporate Finance and Strategy Number 56, Autumn 2015 Finance McKinsey on. McKinsey on Finance. is a quarterly publication written by corporate-finance experts and practitioners at McKinsey & Company. This publication offers readers insights into value-creating .

3y ago

272 Views

SAP Simple Finance - tutorialspoint

SAP Simple Finance is only known as S/4 HANA Finance and this will be the only name of other releases of SAP Simple Finance. During the installation of SAP S/4 HANA Finance, various front-end and back-end components get installed. 2. SAP Simple Finance Introduction

3y ago

252 Views

pwc Finance Function Transformation

PwC’s finance effectiveness framework looks at 3 core areas within finance, to frame a programme of work that makes the finance function more effective, and to increase its interaction with the business: Finance efficiency Risk, Compliance and Control Finance Insights (the key lever in

2y ago

285 Views

Sustainable Finance: A Primer and Recent Developments

Social (impact) finance RBC Wealth Management Green finance Resonance Fund Impact finance Bridges Fund Management Socially responsible finance Nutmeg . Source: Author's own research. Despite this variety of definitions, some consistency of terminology has coalesced around the construct of "sustainable finance" in terms of a range of

1y ago

151 Views

The International Finance Corporation's Blended Finance Operations

The International Finance Corporation's Blended Finance Operations . 1. Context. Blended finance is a risk mitigation tool applied to investments for which it is difficult to attract commercial funding. Blended finance refers to the combination of concessional and commercial funding in private sector-led projects. Its rationale is

1y ago

187 Views

Agile Finance Reimagined Reimagining Finance for the New Normal

6 Agile Finance Reimagined: Reimagining finance for the new normal While the global impact of COVID-19 is still evolving, this much is clear: finance functions have been forced to deliver more value to the business, beyond simply driving down costs. "We are seeing that shift from finance being focused on efficiency to effectiveness," said the

1y ago

130 Views

Oracle Banking Supply Chain Finance User Guide

Oracle Banking Supply Chain Finance User Guide 7 2. Supply Chain Finance - An Overview 2.1 Supply Chain Finance Supply Chain Finance commonly known as (SCF) is a type of supplier finance which enables the supplier to cash his receivables early than the actual payment date, thereby freeing up its working capital.

1y ago

132 Views

Hybrid Generative-Discriminative Visual Categorization.

It looks like you're using an ad-blocker