Face Recognition By Independent Component Analysis

3y ago
21 Views
2 Downloads
1.37 MB
15 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Eli Jorgenson
Transcription

1450IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002Face Recognition by IndependentComponent AnalysisMarian Stewart Bartlett, Member, IEEE, Javier R. Movellan, Member, IEEE, and Terrence J. Sejnowski, Fellow, IEEEAbstract—A number of current face recognition algorithms useface representations found by unsupervised statistical methods.Typically these methods find a set of basis images and representfaces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. Thebasis images found by PCA depend only on pairwise relationshipsbetween pixels in the image database. In a task such as facerecognition, in which important information may be contained inthe high-order relationships among pixels, it seems reasonable toexpect that better basis images may be found by methods sensitiveto these high-order statistics. Independent component analysis(ICA), a generalization of PCA, is one such method. We used aversion of ICA derived from the principle of optimal informationtransfer through sigmoidal neurons. ICA was performed on faceimages in the FERET database under two different architectures,one which treated the images as random variables and the pixelsas outcomes, and a second which treated the pixels as randomvariables and the images as outcomes. The first architecture foundspatially local basis images for the faces. The second architectureproduced a factorial face code. Both ICA representations weresuperior to representations based on PCA for recognizing facesacross days and changes in expression. A classifier that combinedthe two ICA representations gave the best performance.Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA),unsupervised learning.I. INTRODUCTIONREDUNDANCY in the sensory input contains structural information about the environment. Barlow has argued thatsuch redundancy provides knowledge [5] and that the role of thesensory system is to develop factorial representations in whichthese dependencies are separated into independent componentsManuscript received May 21, 2001; revised May 8, 2002. This work wassupported by University of California Digital Media Innovation ProgramD00-10084, the National Science Foundation under Grants 0086107 andIIT-0223052, the National Research Service Award MH-12417-02, theLawrence Livermore National Laboratories ISCR agreement B291528, and theHoward Hughes Medical Institute. An abbreviated version of this paper appearsin Proceedings of the SPIE Symposium on Electronic Imaging: Science andTechnology; Human Vision and Electronic Imaging III, Vol. 3299, B. Rogowitzand T. Pappas, Eds., 1998. Portions of this paper use the FERET databaseof facial images, collected under the FERET program of the Army ResearchLaboratory.The authors are with the University of California-San Diego, La Jolla,CA 92093-0523 USA (e-mail: marni@salk.edu; javier@inc.ucsd.edu;terry@salk.edu).T. J. Sejnowski is also with the Howard Hughes Medical Institute at the SalkInstitute, La Jolla, CA 92037 USA.Digital Object Identifier 10.1109/TNN.2002.804287(ICs). Barlow also argued that such representations are advantageous for encoding complex objects that are characterized byhigh-order dependencies. Atick and Redlich have also arguedfor such representations as a general coding strategy for the visual system [3].Principal component analysis (PCA) is a popular unsupervised statistical method to find useful image representations.Consider a set of basis images each of which has pixels.A standard basis set consists of a single active pixel with intensity 1, where each basis image has a different active pixel. Anygiven image with pixels can be decomposed as a linear combination of the standard basis images. In fact, the pixel valuesof an image can then be seen as the coordinates of that imagewith respect to the standard basis. The goal in PCA is to find a“better” set of basis images so that in this new basis, the imagecoordinates (the PCA coefficients) are uncorrelated, i.e., theycannot be linearly predicted from each other. PCA can, thus, beseen as partially implementing Barlow’s ideas: Dependenciesthat show up in the joint distribution of pixels are separated outinto the marginal distributions of PCA coefficients. However,PCA can only separate pairwise linear dependencies betweenpixels. High-order dependencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properlyseparated.Some of the most successful representations for face recognition, such as eigenfaces [57], holons [15], and local featureanalysis [50] are based on PCA. In a task such as face recognition, much of the important information may be containedin the high-order relationships among the image pixels, andthus, it is important to investigate whether generalizations ofPCA which are sensitive to high-order relationships, not justsecond-order relationships, are advantageous. Independentcomponent analysis (ICA) [14] is one such generalization. Anumber of algorithms for performing ICA have been proposed.See [20] and [29] for reviews. Here, we employ an algorithmdeveloped by Bell and Sejnowski [11], [12] from the pointof view of optimal information transfer in neural networkswith sigmoidal transfer functions. This algorithm has provensuccessful for separating randomly mixed auditory signals (thecocktail party problem), and for separating electroencephalogram (EEG) signals [37] and functional magnetic resonanceimaging (fMRI) signals [39].We performed ICA on the image set under two architectures.Architecture I treated the images as random variables andthe pixels as outcomes, whereas Architecture II treated the1045-9227/02 17.00 2002 IEEE

BARTLETT et al.: FACE RECOGNITION BY INDEPENDENT COMPONENT ANALYSISpixels as random variables and the images as outcomes.1Matlab code for the ICA representations is available athttp://inc.ucsd.edu/ marni.Face recognition performance was tested using the FERETdatabase [52]. Face recognition performances using the ICArepresentations were benchmarked by comparing them to performances using PCA, which is equivalent to the “eigenfaces”representation [51], [57]. The two ICA representations werethen combined in a single classifier.II. ICAThere are a number of algorithms for performing ICA [11],[13], [14], [25]. We chose the infomax algorithm proposed byBell and Sejnowski [11], which was derived from the principleof optimal information transfer in neurons with sigmoidaltransfer functions [27]. The algorithm is motivated as follows:Let be an -dimensional ( -D) random vector representing adistribution of inputs in the environment. (Here, boldface capitals denote random variables, whereas plain text capitals denotebe aninvertible matrix,andmatrices). Letan -D random variable representing the outputsis anof -neurons. Each component ofinvertible squashing function, mapping real numbers into theinterval. Typically, the logistic function is used1451of the nonlinear transfer function is the same as the cumulative density functions of the underlying ICs (up to scaling andtranslation) it can be shown that maximizing the joint entropyof the outputs in also minimizes the mutual information be[12], [42]. In practice, thetween the individual outputs inlogistic transfer function has been found sufficient to separatemixtures of natural signals with sparse distributions includingsound sources [11].The algorithm is speeded up by including a “sphering” stepprior to learning [12]. The row means of are subtracted, andis passed through the whitening matrix, which isthentwice the inverse square root2 of the covariance matrix(4)This removes the first and the second-order statistics of the data;both the mean and covariances are set to zero and the variancesare equalized. When the inputs to ICA are the “sphered” data,is the product of the sphering mathe full transform matrixtrix and the matrix learned by ICA(5)MacKay [36] and Pearlmutter [48] showed that the ICA algorithm converges to the maximum likelihood estimate offor the following generative model of the data:(1)(6)variables are linear combinations of inputs andThecan be interpreted as presynaptic activations of -neurons. Thevariables can be interpreted as postsynaptic activa. The goal in Belltion rates and are bounded by the intervaland Sejnowski’s algorithm is to maximize the mutual informaand the output of the neuraltion between the environmentnetwork . This is achieved by performing gradient ascent onthe entropy of the output with respect to the weight matrix .is as follows:The gradient update rule for the weight matrix,is a vector of independent randomwherevariables, called the sources, with cumulative distributions equalto , in other words, using logistic activation functions corresponds to assuming logistic random sources and using the standard cumulative Gaussian distribution as activation functions,corresponds to assuming Gaussian random sources. Thus,the inverse of the weight matrix in Bell and Sejnowski’s algorithm, can be interpreted as the source mixing matrix and thevariables can be interpreted as the maximum-likelihood (ML) estimates of the sources that generated the data.(2)A. ICA and Other Statistical Techniqueswhere is the identity matrix. The logistic transfer function (1).givesWhen there are multiple inputs and outputs, maximizing theencourages the individual outjoint entropy of the outputputs to move toward statistical independence. When the formICA and PCA: PCA can be derived as a special case of ICAwhich uses Gaussian source models. In such case the mixingis unidentifiable in the sense that there is an infinitematrixnumber of equally good ML solutions. Among all possible MLsolutions, PCA chooses an orthogonal matrix which is optimalin the following sense: 1) Regardless of the distribution of ,is the linear combination of input that allows optimal linearreconstruction of the input in the mean square sense; and 2)fixed,allows optimal linear reconstrucforwhich aretion among the class of linear combinations of. If the sources are Gaussian, theuncorrelated withlikelihood of the data depends only on first- and second-orderare, instatistics (the covariance matrix). In PCA, the rows offact, the eigenvectors of the covariance matrix of the data.Second-order statistics capture the amplitude spectrum ofimages but not their phase spectrum. The high-order statisticscapture the phase spectrum [12], [19]. For a given sample1Preliminary versions of this work appear in [7] and [9]. A longer discussionof unsupervised learning for face recognition appears in [6].2We use the principal square root, which is the unique square root for whichevery eigenvalue has nonnegative real part., the ratio between the second andwherefirst partial derivatives of the activation function, stands forfor expected value,is the entropy of thetranspose,is the gradient of the entropyrandom vector , andin matrix form, i.e., the cell in row , column of this matrixwith respect to. Computationis the derivative ofof the matrix inverse can be avoided by employing the naturalgradient [1], which amounts to multiplying the absolute gradient, resulting in the following learning rule [12]:by(3)

1452IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002of natural images, we can scramble their phase spectrumwhile maintaining their power spectrum. This will dramaticallyalter the appearance of the images but will not change theirsecond-order statistics. The phase spectrum, not the powerspectrum, contains the structural information in images thatdrives human perception. For example, as illustrated in Fig. 1,a face image synthesized from the amplitude spectrum of faceA and the phase spectrum of face B will be perceived as animage of face B [45], [53]. The fact that PCA is only sensitiveto the power spectrum of images suggests that it might notbe particularly well suited for representing natural images.The assumption of Gaussian sources implicit in PCA makesit inadequate when the true sources are non-Gaussian. In particular, it has been empirically observed that many naturalsignals, including speech, natural images, and EEG are betterdescribed as linear combinations of sources with long taileddistributions [11], [19]. These sources are called “high-kurtosis,” “sparse,” or “super-Gaussian” sources. Logistic randomvariables are a special case of sparse source models. Whensparse source models are appropriate, ICA has the followingpotential advantages over PCA: 1) It provides a better probabilistic model of the data, which better identifies where the dataconcentrate in -dimensional space. 2) It uniquely identifiesthe mixing matrix . 3) It finds a not-necessarily orthogonalbasis which may reconstruct the data better than PCA in thepresence of noise. 4) It is sensitive to high-order statistics inthe data, not just the covariance matrix.Fig. 2 illustrates these points with an example. The figureshows samples from a three-dimensional (3-D) distributionconstructed by linearly mixing two high-kurtosis sources. Thefigure shows the basis vectors found by PCA and by ICA on thisproblem. Since the three ICA basis vectors are nonorthogonal,they change the relative distance between data points. Thischange in metric may be potentially useful for classificationalgorithms, like nearest neighbor, that make decisions based onrelative distances between points. The ICA basis also alters theangles between data points, which affects similarity measuressuch as cosines. Moreover, if an undercomplete basis set ischosen, PCA and ICA may span different subspaces. Forexample, in Fig. 2, when only two dimensions are selected,PCA and ICA choose different subspaces.The metric induced by ICA is superior to PCA in the sensethat it may provide a representation more robust to the effectof noise [42]. It is, therefore, possible for ICA to be better thanPCA for reconstruction in noisy or limited precision environments. For example, in the problem presented in Fig. 2, wefound that if only 12 bits are allowed to represent the PCA andICA coefficients, linear reconstructions based on ICA are 3 dBbetter than reconstructions based on PCA (the noise power is reduced by more than half). A similar result was obtained for PCAand ICA subspaces. If only four bits are allowed to representthe first 2 PCA and ICA coefficients, ICA reconstructions are3 dB better than PCA reconstructions. In some problems, onecan think of the actual inputs as noisy versions of some canonical inputs. For example, variations in lighting and expressionscan be seen as noisy versions of the canonical image of a person.Having input representations which are robust to noise may potentially give us representations that better reflect the data.Fig. 1. (left) Two face images. (Center) The two faces with scrambled phase.(right) Reconstructions with the amplitude of the original face and the phase ofthe other face. Faces images are from the FERET face database, reprinted withpermission from J. Phillips.When the sources models are sparse, ICA is closely relatedto the so called nonorthogonal “rotation” methods in PCA andfactor analysis. The goal of these rotation methods is to find directions with high concentrations of data, something very similar to what ICA does when the sources are sparse. In such cases,ICA can be seen as a theoretically sound probabilistic methodto find interesting nonorthogonal “rotations.”ICA and Cluster Analysis: Cluster analysis is a technique forfinding regions in -dimensional space with large concentrations of data. These regions are called “clusters.” Typically, themain statistic of interest in cluster analysis is the center of thoseclusters. When the source models are sparse, ICA finds directions along which significant concentrations of data points areobserved. Thus, when using sparse sources, ICA can be seenas a form of cluster analysis. However, the emphasis in ICA ison finding optimal directions, rather than specific locations ofhigh data density. Fig. 2 illustrates this point. Note how the dataconcentrates along the ICA solutions, not the PCA solutions.Note also that in this case, all the clusters have equal mean, andthus are better characterized by their orientation rather than theirposition in space.It should be noted that ICA is a very general technique. Whensuper-Gaussian sources are used, ICA can be seen as doingsomething akin to nonorthogonal PCA and to cluster analysis,however, when the source models are sub-Gaussian, the relationship between these techniques is less clear. See [30] for adiscussion of ICA in the context of sub-Gaussian sources.B. Two Architectures for Performing ICA on Imagesbe a data matrix withrows andcolumns. WeLetcan think of each column of as outcomes (independent trials)as theof a random experiment. We think of the th row ofacrossindespecific value taken by a random variablependent trials. This defines an empirical probability distributionin which each column of is given probabilityfor. Independence is then defined with respect to suchmass

BARTLETT et al.: FACE RECOGNITION BY INDEPENDENT COMPONENT ANALYSIS1453Fig. 2. (top) Example 3-D data distribution and corresponding PC and IC axes. Each axis is a column of the mixing matrix Wfound by PCA or ICA. Note thePC axes are orthogonal while the IC axes are not. If only two components are allowed, ICA chooses a different subspace than PCA. (bottom left) Distribution ofthe first PCA coordinates of the data. (bottom right) Distribution of the first ICA coordinates of the data. Note that since the ICA axes are nonorthogonal, relativedistances between points are different in PCA than in ICA, as are the angles between points.a distribution. For example, we say that rows and of areindependent if it is not possible to predict the values taken byacross columns from the corresponding values taken by ,i.e.,for all(7)where is the empirical distribution as in (7).Our goal in this paper is to find a good set of basis imagesto represent a database of faces. We organize each image in thedatabase as a long vector with as many dimensions as numberof pixels in the image. There are at least two ways in which ICAcan be applied to this problem.1) We can organize our database into a matrix where eachrow vector is a different image. This approach is illustrated in (Fig. 3 left). In this approach, images are randomvariables and pixels are trials. In this approach, it makessense to talk about independence of images or functionsof images. Two images and are independent if whenmoving across pixels, it is not possible to predict the valuetaken by the pixel on image based on the value taken bythe same pixel on image . A similar approach was usedby Bell and Sejnowski for sound source separation [11],for EEG analysis [37], and for fMRI [39].2) We can transpose and organize our data so that imagesare in the columns of . This approach is illustrated in(Fig. 3 right). In this approach, pixels are random variables and images are trials. Here, it makes sense to talkabout independence of pixels or functions of pixels. Forexample, pixel and would be independent if whenmoving across the entire set of images it is not possibleto predict the value taken by pixel based on the corresponding value taken by pixel on the same image. Thisapproach was inspired by Bell and Sejnowski’s work onthe ICs of natural images [12].(a)(c)(b)(d)Fig. 3. Two architectures for performing ICA on images. (a) Architecture Ifor finding statistically independent basis images. Performing source separationon the face images produced IC images in the rows of U . (b) The gray valuesat pixel location i are plotted for each face image. ICA in architecture I findsweight vectors in the directions of statistical dependencies among the pixellocations. (c) Architecture II for finding a factorial code. Performing sourceseparation on the pixels produced a factorial code in the columns of the outputmatrix, U

Principal component analysis (PCA) is a popular unsuper-vised statistical method to find useful image representations. . Face recognition performance was tested using the FERET database [52]. Face recognition performances using the ICA representations were benchmarked by comparing them to per-

Related Documents:

Keywords: Independent component analysis, ICA, principal component analysis, PCA, face recognition. 1. INTRODUCTION Several advances in face recognition such as "H lons, " "Eigenfa es, " and "Local Feature Analysis4" have employed forms of principal component analysis, which addresses only second-order moments of the input. Principal component

Subspace methods have been applied successfully in numerous visual recognition tasks such as face localization, face recognition, 3D object recognition, andtracking. In particular, Principal Component Analysis (PCA) [20] [13] ,andFisher Linear Dis criminant (FLD) methods [6] have been applied to face recognition with impressive results.

2.1 Face Recognition Face recognition has been an active research topic since the 1970’s [Kan73]. Given an input image with multiple faces, face recognition systems typically first run face detection to isolate the faces. Each face is pre

The goal of this lab is to implement face recognition using Principal Component Analysis (PCA). One of the most important applications of the PCA is mapping a dataset into a new space with smaller dimension in . 3 Face Recognition In face recognition, we have access to a dataset called the training set

18-794 Pattern Recognition Theory! Speech recognition! Optical character recognition (OCR)! Fingerprint recognition! Face recognition! Automatic target recognition! Biomedical image analysis Objective: To provide the background and techniques needed for pattern classification For advanced UG and starting graduate students Example Applications:

1. Introduction With the rapid development of artificial intelligence in re-cent years, facial recognition gains more and more attention. Compared with the traditional card recognition, fingerprint recognition and iris recognition, face recognition has many advantages, including but li

Garment Sizing Chart 37 Face Masks 38 PFL Face Masks 39 PFL Face Masks with No Magic Arch 39 PFL Laser Face Masks 40 PFL N-95 Particulate Respirator 40 AlphaAir Face Masks 41 AlphaGuard Face Masks 41 MVT Highly Breathable Face Masks 42 Microbreathe Face Masks 42 Coo

AngularJS i About the Tutorial AngularJS is a very powerful JavaScript library. It is used in Single Page Application (SPA) projects. It extends HTML DOM with additional attributes and makes it more responsive to user actions. AngularJS is open source, completely free, and used by thousands of developers around the world. It is licensed under the Apache license version 2.0. Audience This .