Variational Bayesian Sparse Additive Matrix Factorization

1y ago

12 Views

2 Downloads

951.84 KB

34 Pages

Last View : Today

Last Download : 3m ago

Upload by : Jenson Heredia

Report this link

Download PDF

Transcription

Machine Learning, vol.92, no.2–3, pp.319–347, 2013.Variational BayesianSparse Additive Matrix FactorizationShinichi NakajimaNikon Corporation, Japannakajima.s@nikon.co.jpMasashi SugiyamaTokyo Institute of Technology, itech.ac.jp/ sugiS. Derin BabacanUniversity of Illinois at Urbana-Champaign, USA.dbabacan@illinois.eduAbstractPrincipal component analysis (PCA) approximates a data matrix with a low-rankone by imposing sparsity on its singular values. Its robust variant can cope withspiky noise by introducing an element-wise sparse term. In this paper, we extendsuch sparse matrix learning methods, and propose a novel framework called sparseadditive matrix factorization (SAMF). SAMF systematically induces various typesof sparsity by a Bayesian regularization eﬀect, called model-induced regularization.Although group LASSO also allows us to design arbitrary types of sparsity on amatrix, SAMF, which is based on the Bayesian framework, provides inference without any requirement for manual parameter tuning. We propose an eﬃcient iterativealgorithm called the mean update (MU) for the variational Bayesian approximationto SAMF, which gives the global optimal solution for a large subset of parametersin each step. We demonstrate the usefulness of our method on benchmark datasetsand a foreground/background video separation problem.Keywordsvariational Bayes, robust PCA, matrix factorization, sparsity, model-induced regularization.1

Variational Bayesian Sparse Additive Matrix Factorization12IntroductionPrincipal component analysis (PCA) (Hotelling, 1933) is a classical method for obtaininglow-dimensional expression of data. PCA can be regarded as approximating a data matrixwith a low-rank one by imposing sparsity on its singular values. A robust variant of PCAfurther copes with sparse spiky noise included in observations (Candès et al., 2011; Dinget al., 2011; Babacan et al., 2012).In this paper, we extend the idea of robust PCA, and propose a more general frameworkcalled sparse additive matrix factorization (SAMF). The proposed SAMF can handlevarious types of sparse noise such as row-wise and column-wise sparsity, in addition toelement-wise sparsity (spiky noise) and low-rank sparsity (low-dimensional expression);furthermore, their arbitrary additive combination is also allowed. In the context of robustPCA, row-wise and column-wise sparsity can capture noise observed when some sensorsare broken and their outputs are always unreliable, or some accident disturbs all sensoroutputs at a time.Flexibility of SAMF in sparsity design allows us to incorporate side information moreeﬃciently. We show such an example in foreground/background video separation, wheresparsity is induced based on image segmentation. Although group LASSO (Yuan andLin, 2006; Raman et al., 2009) also allows arbitrary sparsity design on matrix entries,SAMF, which is based on the Bayesian framework, enables us to estimate all unknownsfrom observations, and allows us to enjoy inference without manual parameter tuning.Technically, our approach induces sparsity by the so-called model-induced regularization (MIR) (Nakajima and Sugiyama, 2011). MIR is an implicit regularization propertyof the Bayesian approach, which is based on one-to-many (i.e., redundant) mapping ofparameters and outcomes (Watanabe, 2009). In the case of matrix factorization, an observed matrix is decomposed into two redundant matrices, which was shown to inducesparsity in the singular values under the variational Bayesian approximation (Nakajimaand Sugiyama, 2011).We show that MIR in SAMF can be interpreted as automatic relevance determination(ARD) (Neal, 1996), which is a popular Bayesian approach to inducing sparsity. Nevertheless, we argue that the MIR formulation is more preferable since it allows us to derive apractically useful algorithm called the mean update (MU) from a recent theoretical result(Nakajima et al., 2013): The MU algorithm is based on the variational Bayesian approximation, and gives the global optimal solution for a large subset of parameters in eachstep. Through experiments, we show that the MU algorithm compares favorably with astandard iterative algorithm for variational Bayesian inference.2FormulationIn this section, we formulate the sparse additive matrix factorization (SAMF) model.

Variational Bayesian Sparse Additive Matrix Factorization2.13Examples of FactorizationIn standard MF, an observed matrix V RL M is modeled by a low rank target matrixU RL M contaminated with a random noise matrix E RL M .V U E.Then the target matrix U is decomposed into the product of two matrices A RM Hand B RL H :U low-rank BA H bh a h,(1)h 1where denotes the transpose of a matrix or vector. Throughout the paper, we denote acolumn vector of a matrix by a bold small letter, and a row vector by a bold small letterwith a tilde:e M ) ,A (a1 , . . . , aH ) (ea1 , . . . , aB (b1 , . . . , bH ) (eb1 , . . . , ebL ) .The last equation in Eq.(1) implies that the plain matrix product (i.e., BA ) is thesum of rank-1 components. It was elucidated that this product induces an implicit regularization eﬀect called model-induced regularization (MIR), and a low-rank (singularcomponent-wise sparse) solution is produced under the variational Bayesian approximation (Nakajima and Sugiyama, 2011).Let us consider other types of factorization:e1 , . . . , γ e de U row ΓE D (γ1e dL L) ,Ucolumn EΓD d(γ1d e1 , . . . , γMeM ),(2)(3)dwhere ΓD diag(γ1d , . . . , γM) RM M and ΓE diag(γ1e , . . . , γLe ) RL L are diagonalmatrices, and D, E RL M . These examples are also matrix products, but one of thefactors is restricted to be diagonal. Because of this diagonal constraint, the l-th diagonalentry γle in ΓE is shared by all the entries in the l-th row of U row as a common factor.din ΓD is shared by all the entries in the m-th columnSimilarly, the m-th diagonal entry γmcolumnof U.Another example is the Hadamard (or element-wise) product:U element E D, where (E D)l,m El,m Dl,m .(4)In this factorization form, no entry in E and D is shared by more than one entry inU element .In fact, the forms (2)–(4) of factorization induce diﬀerent types of sparsity, throughthe MIR mechanism. In Section 2.2, they will be derived as a row-wise, a column-wise,and an element-wise sparsity inducing terms, respectively, within a uniﬁed framework.

Variational Bayesian Sparse Additive Matrix Factorization4 !Figure 1: An example of SMF-term construction. G(·; X ) with X :(k, l′ , m′ ) 7 (l, m) maps the set {U ′(k) }Kk 1 of the PR matrices to the′(k)target matrix U , so that Ul′ ,m′ UX (k,l′ ,m′ ) Ul,m .U U U 2U1,3U2,3"U1,3U2,3"U1,3U2,3"G !G !G !!U ′(1) U1,1!U ′(2) U2,1 !U ′(2) !U′(1)U1,2U2,2"U1,3 B (1) A(1) "U2,3 B (2) A(2) U1,1U2,1" B (1) A(1) U1,2U2,2" B (2) A(2) U!"U ′(1) U1,1 B (1) A(1) !"U ′(2) U2,1 B (2) A(2) !"U ′(3) U1,2 B (3) A(3) ′(3) !U1,3U2,3" B (3) A(3) !"U ′(4) U2,2 B (4) A(4) !"U ′(5) U1,3 B (5) A(5) !"U ′(6) U2,3 B (6) A(6) Figure 2: SMF-term construction for the row-wise (top), the column-wise(middle), and the element-wise (bottom) sparse terms.2.2A General Expression of FactorizationOur general expression consists of partitioning, rearrangement, and factorization. Thefollowing is the form of a sparse matrix factorization (SMF) term:′(k)U G({U ′(k) }K B (k) A(k) .k 1 ; X ), where U(5) K′(k)′(k) M)k 1 (LHere, {A(k) , B (k) }K7 k 1 are parameters to be estimated, and G(·; X ) : RL MRis a designed function associated with an index mapping parameter X , which willbe explained shortly.Figure 1 shows how to construct an SMF term. First, we partition the entries of Uinto K parts. Then, by rearranging the entries in each part, we form partitioned-and′(k)′(k)rearranged (PR) matrices U ′(k) RL Mfor k 1, . . . , K. Finally, each of U ′(k)′(k)′(k)′(k)′(k)is decomposed into the product of A(k) RM Hand B (k) RL H , whereH ′(k) min(L′(k) , M ′(k) ).In Eq.(5), the function G(·; X ) is responsible for partitioning and rearrangement: It

Variational Bayesian Sparse Additive Matrix Factorization5Table 1: Examples of SMF term. See the main text for details.Factorization Induced sparsityK(L′(k) , M ′(k) )X : (k, l′ , m′ ) 7 (l, m)U BA low-rank1(L, M )X (1, l′ , m′ ) (l′ , m′ )U ΓE Drow-wiseL(1, M )X (k, 1, m′ ) (k, m′ )U EΓDcolumn-wiseM(L, 1)X (k, l′ , 1) (l′ , k)U E Delement-wiseL M(1, 1)X (k, 1, 1) vec-order(k)L Mmaps the set {U ′(k) }K, based on thek 1 of the PR matrices to the target matrix U R′′one-to-one map X : (k, l , m ) 7 (l, m) from the indices of the entries in {U ′(k) }Kk 1 to theindices of the entries in U , such that()′(k)(6)G({U ′(k) }K Ul,m UX (k,l′ ,m′ ) Ul′ ,m′ .k 1 ; X ) l,mAs will be discussed in Section 4.1, the SMF-term expression (5) under the variationalBayesian approximation induces low-rank sparsity in each partition. This means thatpartition-wise sparsity is also induced. Accordingly, partitioning, rearrangement, andfactorization should be designed in the following manner. Suppose that we are givena required sparsity structure on a matrix (examples of possible side information thatsuggests particular sparsity structures are given in Section 2.3). We ﬁrst partition thematrix, according to the required sparsity. Some partitions can be submatrices. Werearrange each of the submatrices on which we do not want to impose low-rank sparsityinto a long vector (U ′(3) in the example in Figure 1). We leave the other submatriceswhich we want to be low-rank (U ′(2) ), the vectors (U ′(1) and U ′(4) ) and the scalars (U ′(5) )as they are. Finally, we factorize each of the PR matrices to induce sparsity through theMIR mechanism.Let us, for example, assume that row-wise sparsity is required. We ﬁrst make therow-wise partition, i.e., separate U RL M into L pieces of M -dimensional row vece tors U ′(l) u R1 M . Then, we factorize each partition as U ′(l) B (l) A(l) l(see the top illustration in Figure 2). Thus, we obtain the row-wise sparse term (2).Here, X (k, 1, m′ ) (k, m′ ) makes the following connection between Eqs.(2) and (5):el A(k) RM 1 for k l. Similarly, requiring column-wise andγle B (k) R, delement-wise sparsity leads to Eqs.(3) and (4), respectively (see the bottom two illustrations in Figure 2). Table 1 summarizes how to design these SMF terms, wherevec-order(k) (1 ((k 1) mod L), ⌈k/L⌉) goes along the columns one after another inthe same way as the vec operator forming a vector by stacking the columns of a matrix(in other words, (U ′(1), . . . , U ′(K) ) vec(U )).

Variational Bayesian Sparse Additive Matrix Factorization2.36Sparse Additive Matrix FactorizationWe deﬁne a sparse additive matrix factorization (SAMF) model as the sum of SMF terms(5):V S U (s) E,(7)s 1where U(s)(s)(s) G({B (k,s) A(k,s) }K).k 1 ; X(8)In practice, SMF terms should be designed based on side information. Suppose thatV RL M consists of M samples of L-dimensional sensor outputs. In robust PCA(Candès et al., 2011; Ding et al., 2011; Babacan et al., 2012), an element-wise sparse termis added to the low-rank term, which is expected to be the clean signal, when sensoroutputs are expected to contain spiky noise:V U low-rank U element E.(9)Here, it can be said that the “expectation of spiky noise” is used as side information.Similarly, if we suspect that some sensors are broken, and their outputs are unreliableover all M samples, we should prepare the row-wise sparse term to capture the expectedrow-wise noise, and try to keep the estimated clean signal U low-rank uncontaminated withthe row-wise noise:V U low-rank U row E.If we know that some accidental disturbances occurred during the observation, but donot know their exact locations (i.e., which samples are aﬀected), the column-wise sparseterm can eﬀectively capture these disturbances.The SMF expression (5) enables us to use side information in a more ﬂexible way. InSection 5.4, we show that our method can be applied to a foreground/background videoseparation problem, where moving objects (such as a person in Figure 3) are consideredto belong to the foreground. Previous approaches (Candès et al., 2011; Ding et al., 2011;Babacan et al., 2012) constructed the observation matrix V by stacking all pixels in eachframe into each column (Figure 4), and ﬁtted it by the model (9). Here, the low-rankterm and the element-wise sparse term are expected to capture the static background andthe moving foreground, respectively. However, we can also rely on a natural assumptionthat a pixel segment having similar intensity values in an image tends to belong to thesame object. Based on this side information, we adopt a segment-wise sparse term, wherethe PR matrix is constructed using a precomputed over-segmented image (Figure 5). Wewill show in Section 5.4 that the segment-wise sparse term captures the foreground moreaccurately than the element-wise sparse term.Let us summarize the parameters of the SAMF model (7) as follows:(s)(s)Θ {ΘA , ΘB }Ss 1 ,where(s)(s)ΘA {A(k,s) }Kk 1 ,(s)(s)ΘB {B (k,s) }Kk 1 .

Variational Bayesian Sparse Additive Matrix FactorizationVpixelsbackgroundtimeforegroundFigure 3: Foreground/backgroundvideo separation task.7Figure 4: The observation matrix Vis constructed by stacking all pixelsin each frame into each column.pixelspre-segmentationtimeFigure 5: Construction of a segment-wise sparse term. The original frame is pre-segmentedand the sparseness is induced in a segment-wise manner. Details are described in Section 5.4.

Variational Bayesian Sparse Additive Matrix Factorization8As in the probabilistic MF (Salakhutdinov and Mnih, 2008), we assume independentGaussian noise and priors. Thus, the likelihood and the priors are written as 2S 1 ,(10)p(V Θ) exp 2 V U (s)2σs 1Fro()(s)SK)( 1(s)(k,s) 1 (k,s) p({ΘA }Ss 1 ) exp A,(11)tr A(k,s) CA2 s 1 k 1()S K (s)1 ( (k,s) (k,s) 1 (k,s) )(s) Sp({ΘB }s 1 ) exp tr BCBB,(12)2 s 1 k 1where · Fro and tr(·) denote the Frobenius norm and the trace of a matrix, respectively.We assume that the prior covariances of A(k,s) and B (k,s) are diagonal and positive-deﬁnite:(k,s) diag(c(k,s)2, . . . , c(k,s)2a1aH ),(k,s) diag(cb1CACB(k,s)2(k,s)2, . . . , cbH).(k,s)(k,s)Without loss of generality, we assume that the diagonal entries of CA CB are arranged(k,s) (k,s)(k,s) (k,s)in the non-increasing order, i.e., cah cbh cah′ cbh′ for any pair h h′ .2.4Variational Bayesian ApproximationThe Bayes posterior is written asp(Θ V ) p(V Θ)p(Θ),p(V )(13)where p(V ) ⟨p(V Θ)⟩p(Θ) is the marginal likelihood. Here, ⟨·⟩p denotes the expectation over the distribution p. Since the Bayes posterior (13) for matrix factorization iscomputationally intractable, the variational Bayesian (VB) approximation was proposed(Bishop, 1999; Lim and Teh, 2007; Ilin and Raiko, 2010; Babacan et al., 2012).Let r(Θ), or r for short, be a trial distribution. The following functional with respectto r is called the free energy:⟩⟨⟩⟨r(Θ)r(Θ) log log p(V ).(14)F (r V ) logp(V Θ)p(Θ) r(Θ)p(Θ V ) r(Θ)The ﬁrst term is the Kullback-Leibler (KL) distance from the trial distribution to theBayes posterior, and the second term is a constant. Therefore, minimizing the free energy(14) amounts to ﬁnding a distribution closest to the Bayes posterior in the sense of the KLdistance. In the VB approximation, the free energy (14) is minimized over some restrictedfunction space.

Variational Bayesian Sparse Additive Matrix Factorization9Following the standard VB procedure (Bishop, 1999; Lim and Teh, 2007; Babacanet al., 2012), we impose the following decomposability constraint on the posterior:r(Θ) S (s)(s)(s)(s)rA (ΘA )rB (ΘB ).(15)s 1Under this constraint, it is easy to show that the VB posterior minimizing the free energy(14) is written as(s)(′(k,s)S K M (k,s) e (k,s) (k,s)b ′ ,Σr(Θ) N ′(k,s) (ea ′ ;a)Hs 1 k 1mmAm′ 1·′(k,s)L )(k,s) e(k,s) (k,s)NH ′(k,s) (ebl ′ ; bbl′ , ΣB ),(16)l′ 1where Nd (·; µ, Σ) denotes the d-dimensional Gaussian distribution with mean µ and covariance Σ.3Algorithm for SAMFIn this section, we ﬁrst present a theorem that reduces a partial SAMF problem to thestandard MF problem, which can be solved analytically. Then we derive an algorithm forthe entire SAMF problem.3.1Key TheoremLet us denote the mean of U (s) , deﬁned in Eq.(8), over the VB posterior byb (s) ⟨U (s) ⟩ (s) (s) (s) (s)Ur (Θ )r (Θ )AABB(s)(s)b (k,s) Ab(k,s) }K G({B).k 1 ; X(17)Then we obtain the following theorem (its proof is given in Appendix A):b (s′ ) }s′ ̸ s and the noise variance σ 2 , the VB posterior ofTheorem 1 Given {U(s)(s)(s)(ΘA , ΘB ) {A(k,s) , B (k,s) }Kk 1 coincides with the VB posterior of the following MFmodel:)(1′(k,s)(k,s) (k,s) 2′(k,s)(k,s)(k,s) BA,(18)p(Z A,B) exp 2 ZFro2σ)(1 ( (k,s) (k,s) 1 (k,s) )(k,s),(19)p(A) exp tr ACAA2()1 ( (k,s) (k,s) 1 (k,s) )(k,s)p(B) exp tr B,(20)CBB2

Variational Bayesian Sparse Additive Matrix Factorizationfor each k 1, . . . , K (s) . Here, Z ′(k,s) RL10′(k,s) M ′(k,s)is deﬁned as ′(k,s)(s)b (s).Zl′ ,m′ ZX (s) (k,l′ ,m′ ) , where Z (s) V U(21)s′ ̸ sThe left formula in Eq.(21) relates the entries of Z (s) RL M to the entries of {Z ′(k,s) ′(k,s) M ′(k,s) K (s)RL}k 1 by using the map X (s) : (k, l′ , m′ ) 7 (l, m) (see Eq.(6) and Figure 1).Theorem 1 states that a partial problem of SAMF—ﬁnding the posterior ofb (s′ ) }s′ ̸ s and σ 2 — can be solved in the(A(k,s) , B (k,s) ) for each k 1, . . . , K(s), given {Usame way as in the standard VBMF, to which the global analytic solution is available(Nakajima et al., 2013). Based on this theorem, we will propose a useful algorithm in thefollowing subsections.The noise variance σ 2 is also unknown in many applications. To estimate σ 2 , we canuse the following lemma (its proof is also included in Appendix A):(s)(s)Lemma 1 Given the VB posterior for {ΘA , ΘB }Ss 1 , the noise variance σ 2 minimizingthe free energy (14) is given by))((SS{ 1′b (s )b (s) V Uσ2 V 2Fro 2tr ULMs 1s′ s 1(s)S K (b(k,s) Ab(k,s) M ′(k,s) Σ (k,s) )tr (A As 1 k 1b (k,s) Bb (k,s) L′(k,s) Σ· (BB(k,s)3.2)}) .(22)Partial Analytic SolutionTheorem 1 allows us to use the results given in Nakajima et al. (2013), which provide theglobal analytic solution for VBMF. Although the free energy of VBMF is also non-convex,Nakajima et al. (2013) showed that the minimizers can be written as a reweighted singularvalue decomposition. This allows one to solve the minimization problem separately foreach singular component, which facilitated the analysis. By ﬁnding all stationary pointsand calculating the free energy on them, they successfully obtained an analytic-form ofthe global VBMF solution.Combining Theorem 1 above and Theorems 3–5 in Nakajima et al. (2013), we obtainthe following corollaries:b (s′ ) }s′ ̸ s and the noiseCorollary 1 Assume that L′(k,s) M ′(k,s) for all (k, s), and that {U(k,s)variance σ 2 are given. Let γh ( 0) be the h-th largest singular value of Z ′(k,s) , and let(k,s)(k,s)ω ah and ω bh be the associated right and left singular vectors:Z′(k,s) ′(k,s)L h 1(k,s)γh(k,s).ω bh ω (k,s) ah(23)

Variational Bayesian Sparse Additive Matrix Factorization(k,s)secondLet γbhrespect to t:11be the second largest real solution of the following quartic equation with(k,s) 3fh (t) : t4 ξ3(k,s) 2t ξ2(k,s)t ξ1(k,s)t ξ0 0,(24)where the coeﬃcients are deﬁned by(L′(k,s) M ′(k,s) )2 γh ,′(k,s) M ′(k,s))( L(k,s)2(L′(k,s)2 M ′(k,s)2 )ηh2σ 4(k,s) ξ3 γh (k,s)2 (k,s)2 ,L′(k,s) M ′(k,s)cah cbh (k,s)(k,s) ξ3ξ0 ,)2(σ4(k,s)2 ηh (k,s)2 (k,s)2 ,cah cbh)()(σ 2 M ′(k,s)σ 2 L′(k,s)(k,s)21 γh. 1 s)ξ0(k,s)2ηhLet (k,s)γehwhere τ τ 2 L′(k,s) M ′(k,s) σ 4 ,(25)(L′(k,s) M ′(k,s) )σ 2σ4 (k,s)2 (k,s)2 .22cah cbhτ Then, the global VB solution can be expressed asb ′(k,s)VB (Bb (k,s) Ab(k,s) )VB U′(k,s)H (k,s)VBγbh(k,s)ω bh ω (k,s) ,ah{(k,s)secondγbh 0h 1where(k,s)VBγbh(k,s)(k,s)if γh γehotherwise.,(26)b (s′ ) }s′ ̸ s and the noiseCorollary 2 Assume that L′(k,s) M ′(k,s) for all (k, s). Given {U(k,s)(k,s)variance σ 2 , the global empirical VB solution (where the hyperparameters {CA , CB }are also estimated from observation) is given byb ′(k,s)EVBU ′(k,s)H h 1where(k,s)EVBγbh(k,s)EVBγbh(k,s),ω bh ω (k,s) ah{(k,s)VBγ̆h 0(k,s)(k,s)if γh γ (k,s)and hhotherwise. 0,(27)

Variational Bayesian Sparse Additive Matrix Factorization12Here, ′(k,s) γ (k,s) (LM ′(k,s) )σ,h(1(k,s)2(k,s)2c̆h γh (L′(k,s) M ′(k,s) )σ 2′(k,s)′(k,s)2L M)()2(k,s)2 (L′(k,s) M ′(k,s) )σ 2 4L′(k,s) M ′(k,s) σ 4 ,γh(28)(29))(k,s)γ(k,s)VBh 1γ̆ M ′(k,s) logM ′(k,s) σ 2 h()(k,s)γ(k,s)VBh L′(k,s) logγ̆ 1L′(k,s) σ 2 h)1 ((k,s) (k,s)VB′(k,s)′(k,s) (k,s)2 2 2γh γ̆h LMc̆h,σ((k,s) h(k,s)VBand γ̆h(k,s) (k,s)is the VB solution for cah cbh(k,s) c̆h(30).b (s′ ) }s′ ̸ s and the noiseCorollary 3 Assume that L′(k,s) M ′(k,s) for all (k, s). Given {Uvariance σ 2 , the VB posteriors are given byVB(k,s)rA) (k,s) (A′(k,s)H (k,s)NM ′(k,s) (ah(k,s)bh;a, σa(k,s)2IM ′(k,s) ),hh 1(k,s)rBVB)(k,s) (B ′(k,s)H (k,s)NL′(k,s) (bh;bbh(k,s)(k,s)2, σbhIL′(k,s) ),h 1(k,s)VBwhere, for γbhb h(k,s)aσa(k,s)2hbeing the solution given by Corollary 1, (k,s)(k,s)VB b(k,s)(k,s)VB b(k,s) 1(k,s)(k,s)b γbhδh · ω ah , bh γbhδh· ω bh ,1 (k,s)VB(k,s) 1(k,s) 22M ′(k,s) (bγhδbh σ 2 cah){()(k,s)2· ηbh σ 2 (M ′(k,s) L′(k,s) ) (k,s)2σbh }(k,s)2(bηh σ 2 (M ′(k,s) L′(k,s) ))2 1(k,s)VB b(k,s)2L′(k,s) (bγhδh{· ((k,s)2ηbh2(k,s) 2 σ 2 cbh)′(k,s)′(k,s) σ (M L))(k,s)24M ′(k,s) σ 2 ηbh,

Variational Bayesian Sparse Additive Matrix Factorization13Algorithm 1 Mean update (MU) algorithm for (empirical) VB SAMF.b (s) 0(L,M ) for s 1, . . . , S, σ 2 V 2 /(LM ).1: Initialization: UFro2: for s 1 to S do3:The (empirical) VB solution of U ′(k,s) B (k,s) A(k,s) for each k 1, . . . , K (s) , givenb (s′ ) }s′ ̸ s , is computed by Corollary 1 (Corollary 2).{Ub (s) G({Bb (k,s) Ab(k,s) }K (s) ; X (s) ).4:Uk 15: end for(s)(s)6: σ 2 is estimated by Lemma 1, given the VB posterior on {ΘA , ΘB }Ss 1 (computed byCorollary 3).7: Repeat 2 to 6 until convergence.} (k,s)δbh (k,s)2(bηh(k,s)2 σ 2 (M ′(k,s) L′(k,s) ))2 4L′(k,s) σ 2 ηbh{1(M ′(k,s) L′(k,s) )(γh(k,s)(k,s) 22σ 2 M ′(k,s) cah(k,s)VB γbh (k,s)2ηbh (k,s)(M ′(k,s) L′(k,s) )2 (γh ηh(k,s)2(k,s)σ4 c(k,s)2 c(k,s)2ahb,(k,s)if γh γehotherwise.(k,s)VB 2) γbh )4σ 4 L′(k,s) M ′(k,s)(k,s)2 (k,s)2cbh},cah,hNote that the corollaries above assume that L′(k,s) M ′(k,s) for all (k, s). However, wecan easily obtain the result for the case when L′(k,s) M ′(k,s) by considering the transposeb ′(k,s) of the solution. Also, we can always take the mapping X (s) so that L′(k,s) M ′(k,s)Uholds for all (k, s) without any practical restriction. This eases the implementation of thealgorithm.When σ 2 is known, Corollary 1 and Corollary 2 provide the global analytic solutionb (s′ ) }s′ ̸ s depends are ﬁxed. Noteof the partial problem, where the variables on which {Uthat they give the global analytic solution for single-term (S 1) SAMF.3.3Mean Update AlgorithmUsing Corollaries 1–3 and Lemma 1, we propose an algorithm for SAMF, which wecall the mean update (MU) algorithm. We describe its pseudo-code in Algorithm 1, where0(d1 ,d2 ) denotes the d1 d2 matrix with all entries equal to zero. Note that, under theempirical Bayesian framework, all unknown parameters are estimated from observation,which allows inference without manual parameter tuning.The MU algorithm is similar in spirit to the backﬁtting algorithm (Hastie and Tibshirani, 1986; D’Souza et al., 2004), where each additive term is updated to ﬁt a dummytarget. In the MU algorithm, Z (s) deﬁned in Eq.(21) corresponds to the dummy target

Variational Bayesian Sparse Additive Matrix Factorization14in the backﬁtting algorithm. Although each of the corollaries and the lemma above guarantee the global optimality for each step, the MU algorithm does not generally guaranteethe simultaneous global optimality over the entire parameter space. Nevertheless, experimental results in Section 5 show that the MU algorithm performs very well in practice.When Corollary 1 or Corollary 2 is applied in Step 3 of Algorithm 1, a singular valuedecomposition (23) of Z ′(k,s) , deﬁned in Eq.(21), is required. However, for many practicalSMF terms, including the row-wise, the column-wise, and the element-wise terms as well′(k,s) M ′(k,s)as the segment-wise term (which will be deﬁned in Section 5.4), Z ′(k,s) RLis′(k,s)′(k,s)a vector or scalar, i.e., L 1 or M 1. In such cases, the singular value and thesingular vectors are given simply by(k.s) Z ′(k,s) , ω a(k.s) Z ′(k,s) / Z ′(k,s) , ω b11(k.s) Z ′(k,s) , ω a(k.s) 1, ω b11γ1γ14 1if L′(k,s) 1, Z ′(k,s) / Z ′(k,s) if M ′(k,s) 1.(k.s)(k.s)DiscussionIn this section, we ﬁrst relate MIR to ARD. Then, we introduce the standard VB iterationfor SAMF, which is used as a baseline in the experiments. After that, we discuss relatedprevious work, and the limitation of the current work.4.1Relation between MIR and ARDThe MIR eﬀect (Nakajima and Sugiyama, 2011) induced by factorization actually hasa close connection to the automatic relevance determination (ARD) eﬀect (Neal, 1996).Assume CA IH , where Id denotes the d-dimensional identity matrix, in the plain MFmodel (18)–(20) (here we omit the suﬃxes k and s for brevity), and consider the followingtransformation: BA 7 U RL M . Then, the likelihood (18) and the prior (19) on Aare rewritten as()1′′2p(Z U ) exp 2 Z U Fro ,(31)2σ)()1 ( †(32)p(U B) exp tr U (BB ) U ,2where † denotes the Moore-Penrose generalized inverse of a matrix. The prior (20) onB is kept unchanged. p(U B) in Eq.(32) is so-called the ARD prior with the covariancehyperparameter BB RL L . It is known that this induces the ARD eﬀect, i.e., theempirical Bayesian procedure where the prior covariance hyperparameter BB is alsoestimated from observation induces strong regularization and sparsity (Neal, 1996); seealso Efron and Morris (1973) for a simple Gaussian case.In the current context, Eq.(32) induces low-rank sparsity on U if no restriction on2d 2) in Eq.(3), and El,minBB is imposed. Similarly, we can show that (γle )2 in Eq.(2), (γmMLe l R , um R , and Ul,m R,Eq.(4) act as prior variances shared by the entries in u

Variational Bayesian Sparse Additive Matrix Factorization15respectively. This explains the mechanism how the factorization forms in Eqs.(2)–(4)induce row-wise, column-wise, and element-wise sparsity, respectively.When we employ the SMF-term expression (5), MIR occurs in each partition. Therefore, partition-wise sparsity and low-rank sparsity in each partition is observed. Corollaries 1 and 2 theoretically support this fact: Small singular values are discarded bythresholding in Eqs.(26) and (27).4.2Standard VB IterationFollowing the standard procedure for the VB approximation (Bishop, 1999; Lim and Teh,2007; Babacan et al., 2012), we can derive the following algorithm, which we call thestandard VB iteration:b(k,s) σ 2 Z ′(k,s) Bb (k,s) Σ (k,s),AA() 1(k,s)b (k,s) Bb (k,s) L′(k,s) Σ (k,s) σ 2 C (k,s) 1 ,ΣA σ 2 BBAb (k,s) σ 2 Z ′(k,s) Ab(k,s) Σ (k,s),BB() 1(k,s)b(k,s) Ab(k,s) M ′(k,s) Σ (k,s) σ 2 C (k,s) 1 .ΣB σ 2 AAB(33)(34)(35)(36)Iterating Eqs.(33)–(36) for each (k, s) in turn until convergence gives a local minimum ofthe free energy (14).(s) S(k,s)(k,s)In the empirical Bayesian scenario, the hyperparameters {CA , CB }Kk 1, s 1 are alsoestimated from observations. The following update rules give a local minimum of the freeenergy: /M ′(k,s) (ΣA(k,s) 2 bahc(k,s)2ah(k,s)2cbh(k,s))hh ,(k,s)(k,s) bbh 2 /L′(k,s) (ΣB )hh .(37)(38)When the noise variance σ 2 is unknown, it is estimated by Eq.(22) in each iteration.The standard VB iteration is computationally eﬃcient since only a single parameter(s) S(k,s)2b(k,s) , Σ (k,s) , Bb (k,s) , Σ (k,s) , ca(k,s)2in {A, cbh }Khk 1, s 1 is updated in each step. However, it isABknown that the standard VB iteration is prone to suﬀer from the local minima problem(Nakajima et al., 2013). On the other hand, although the MU algorithm also does notguarantee the global optimality as a whole, it simultaneously gives the global optimal(s)(k,s)2b(k,s) , Σ (k,s) , Bb (k,s) , Σ (k,s) , ca(k,s)2solution for the set {A, cbh }Khk 1 for each s in each step.ABIn Section 5, we will experimentally show that the MU algorithm tends to give a bettersolution (i.e., with a smaller free energy) than the standard VB iteration.4.3Related WorkAs widely known, traditional PCA is sensitive to outliers in data and generally failsin their presence. Robust PCA (Candès et al., 2011) was developed to cope with large

Variational Bayesian Sparse Additive Matrix Factorization16outliers that are not modeled within the traditional PCA. Unlike methods based on robuststatistics (Huber and Ronchetti, 2009; Fischler and Bolles, 1981; Torre and Black, 2003;Ke and Kanade, 2005; Gao, 2008; Luttinen et al., 2009; Lakshminarayanan et al., 2011),Candès et al. (2011) explicitly modeled the spiky noise with an additional element-wisesparse term (see Eq.(9)). This model can also be applied to applications where the taskis to estimate the element-wise sparse term itself (as opposed to discarding it as noise).A typical such application is foreground/background video separation (Figure 3).The original formulation of robust PCA is non-Bayesian, and the sparsity is inducedby the ℓ1 -norm regularization. Although its solution can be eﬃciently obtained via theaugmented Lagrange multiplier (ALM) method (Lin et al., 2009), there are unknownalgorithmic parameters that should be carefully tuned to obtain its best performance.Employing a Bayesian formulation addresses this issue: A sampling-bas

Variational Bayesian Sparse Additive Matrix Factorization 3 2.1 Examples of Factorization In standard MF, an observed matrix V RL M is modeled by a low rank target matrix U RL M contaminated with a random noise matrix E RL M. V U E. Then the target matrix U is decomposed into the product of two matrices A RM H and B RL H: Ulow-rank BA H h 1

Related Documents:

Sparse Additive Matrix Factorization for Robust PCA and Its Generalization

propose a novel uniﬁed framework called sparse additive matrix factorization (SAMF). SAMF systematically induces various types of sparsity by the so-called model-induced reg-ularization in the Bayesian framework. We propose an iterative algorithm called the mean update (MU) for the variational Bayesian approximation to SAMF, which gives the .

10 Views

1y ago

Robust Bayesian Compressed Sensing - JunFang@uestc

A new sparse Bayesian learning method is developed for this purpose. The basic idea of the proposed method is . we develop a variational Bayesian method to estimate the indicator variables as well as the sparse . w denotes the additive multivariate Gaussian noise with zero mean and covariance matrix (1/γ)I. The above model can be

7 Views

1y ago

Variational Mean Field for Graphical Models - Computing

entropy is additive :- variational problem for A(q) . Matrix of Inference Methods EP, variational EM, VB, NBP, Gibbs EP, EM, VB, NBP, Gibbs EKF, UKF, moment matching (ADF) Particle filter Other Loopy BP Gibbs Jtree sparse linear algebra Gaussian BP Kalman filter Loopy BP, mean field, structured variational, EP, graph-cuts Gibbs

19 Views

1y ago

Analytical Mechanics: Variational Principles

Agenda 1 Variational Principle in Statics 2 Variational Principle in Statics under Constraints 3 Variational Principle in Dynamics 4 Variational Principle in Dynamics under Constraints Shinichi Hirai (Dept. Robotics, Ritsumeikan Univ.)Analytical Mechanics: Variational Principles 2 / 69

34 Views

2y ago

Variational Bayesian Image Restoration with Group-sparse Modeling of ...

work based on a group-sparse Gaussian scale mixture model. A hierarchical Bayesian estimation is derived using a combination of variational Bayesian inference and a subband-adaptive majorization-minimization method that simpli es computation of the posterior distribution. We show that both of these iterative methods can converge together .

9 Views

1y ago

Bayesian anti-sparse coding - GitHub Pages

this gap by deriving a Bayesian formulation of the anti-sparse coding problem (2) considered in [31]. Note that this objective differs from the contribution in [34] where a Bayesian estima-tor associated with an ' 1-norm loss function has been intro-duced. Instead, we merely introduce a Bayesian counterpart of the variational problem (2).

10 Views

1y ago

Variational zero-in ated Gaussian processes with sparse kernels - UAI

3. A novel sparse GPRN with an on-o process in the mixing matrices leading to sparse and variable-order mixtures of latent signals. 4. A solution to the stochastic variational infer-ence of sparse GPRN where the SVI is derived for the network of full probit-linked covariances. 2 GAUSSIAN PROCESSES We begin by introducing the basics of conventional

10 Views

1y ago

for SMEs - ISO

Adventure tourism is a “ people business ”. By its very nature it involves risks. Provid-ers need to manage those risks, so partici-pants and staff stay safe. The consequences of not doing so can be catastrophic. ISO 21101 : Adventure tourism – Safety management systems – A practical guide for SMEs provides guidance for small businesses to design and implement safety management systems .

54 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Google Cheat Sheets - Shake Up Learning

Google Slides Cheat Sheet p. 15-18 Google Sheets Cheat Sheet p. 19-22 Google Drawings Cheat Sheet p. 23-26 Google Drive for iOS Cheat Sheet p. 27-29 Google Chrome Cheat Sheet p. 30-32 ShakeUpLearning.com Google Cheat Sheets - By Kasey Bell 3

2y ago

296 Views

ChromeBox CXI (McQueen) UM (date) EN

Create a new Google Account. You can create a new Google Account if you don’t already have one. Click . Create a Google Account. on the right to set up a new account. A Google Account gives you access to useful web services developed by Google, such as Gmail, Google Docs, and Google Calendar. Browse as a guest

2y ago

177 Views

Variational Bayesian Sparse Additive Matrix Factorization

It looks like you're using an ad-blocker