Variational Bayesian Image Restoration With Group-sparse Modeling Of .

1y ago
9 Views
2 Downloads
3.11 MB
35 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Louie Bolen
Transcription

Variational Bayesian Image Restoration withGroup-sparse Modeling of Wavelet CoefficientsGanchi Zhang , Nick KingsburySignal Processing Group, Dept. of Engineering, University of CambridgeAbstractIn this work, we present a recent wavelet-based image restoration framework based on a group-sparse Gaussian scale mixture model. A hierarchicalBayesian estimation is derived using a combination of variational Bayesianinference and a subband-adaptive majorization-minimization method thatsimplifies computation of the posterior distribution. We show that both ofthese iterative methods can converge together without needing nested loops,and thus good solutions can be found rapidly in the non-convex search space.We also integrate our method, variational Bayesian with majorization minimization (VBMM), with tree-structured modeling of the wavelet coefficients.This extension achieves significant gains in performance over the coefficientsparse version of the algorithm. The experimental results demonstrate thatthe proposed method and its tree-structured extensions are effective for various imaging applications such as image deconvolution, image superresolution and compressive sensing magnetic resonance imaging (MRI) reconstruction, and that they outperform more conventional sparsity-inducing methodsbased on the l1 -norm.Keywords:image restoration, wavelet group-sparse modeling, variational Bayesianinference, majorization minimization, dual-tree complex wavelets Corresponding author.Email addresses: gz243@cam.ac.uk (Ganchi Zhang), ngk@eng.cam.ac.uk (NickKingsbury)Preprint submitted to Digital Signal ProcessingMay 12, 2015

1. IntroductionLinear inverse problems appear often in many applications of image processing such as restoration, motion estimation, reconstruction and segmentation, where a noisy indirect observation y, of an original image x, is modeledas [1, 2]y Bx n(1)where B of size M N is the matrix representation of a direct linear operatorand n is usually additive Gaussian noise with variance ν 2 .In many scenarios, this inverse problem is highly ill-posed, i.e. the directoperator does not have an inverse or it is nearly singular so that its inverse isvery sensitive to noise [3]. Thus it can only be solved satisfactorily by incorporating some regularization techniques, often using Bayesian inference withprior information [4]. In previous works, it is found that wavelet-based tools,such as the Discrete Wavelet Transform (DWT), are powerful for modelingthis prior knowledge [4, 5, 6].In the past two decades, the DWT has been exploited for a wide rangeof signal processing applications such as denoising, deconvolution, superresolution, compression and classification (see, e.g., [7, 8, 9, 10, 11]). The DWTprovides an efficient implementation based on a filter bank structure utilizing decimation and two discrete filters, a low-pass and a high-pass filter[12]. Wavelet-based regularization methods are good for image restorationproblems because wavelet coefficients tend to be sparse for most image types.Although the DWT is compact, it suffers from shift dependency, lack ofdirectionality, oscillation and aliasing [13]. These will significantly constrainthe performance of a DWT-based signal processing system. To solve theseshortcomings, the dual-tree complex wavelet transform (DT CWT) first proposed by Kingsbury, is a recent simple and efficient redundant transform thathas been widely used in solving diverse signal processing problems. The DTCWT is better than the DWT for image restoration problems due to the factthat directional filters encourage greater sparsity and complex coefficientsshow more consistent persistence across scale. Other recent extensions of theDWT, such as curvelets [14] and contourlets [15], would also work in thiscontext but few, if any, combine the efficiency and good performance of thedual-tree approach.It is known that the wavelet coefficients of natural images display nonGaussian statistics and their marginal distributions typically show a largepeak at zero with long heavy tails [16, 17]. To account for this non-Gaussian2

0281013911461214571315021465789 10 113Level 312 13 14 15Level 2Level 1(a)(b)Figure 1: (a) 8 8 image with 3-level 2D DWT decomposition. (b) quadtree structure ofwavelet coefficients.behavior, many univariate parametric models such as generalized Laplaciandistributions [17] and Bessel K form density models [18] have been previouslyused to model the wavelet coefficients. However, these models do not considerthe persistence across scales of wavelet coefficients [19]. In fact, the energiesof wavelet coefficients of natural images exhibit a strong characteristic signaldependent structure. Fig. 1 depicts an example of quadtree structure thatcorresponds to an 8 8 image with 3-level 2D DWT decomposition. To wellcapture the statistical dependencies, bivariate shrinkage [20], Hidden MarkovTree models (HMM) [21, 22] and Gaussian Scale Mixture Models (GSM)[16, 23] have been widely applied to model wavelet coefficients whose energiesare not randomly distributed. Among those methods, it is acknowledged thatthe GSM model can be used in the framework of sparse Bayesian learning(SBL) where the sparsity is obtained by reweighting the Gaussian prior [24,25]. Based on this connection, several researchers have shown that Bayesianmethods are applicable for wavelet-based regularization problems [4, 5, 16].Recently, Bayesian group-sparse (or block sparse) modeling has emergedwhere the sparsity is imposed on groups instead of individual components[27, 28]. In [28], variational Bayesian (VB) inference is used for group-sparsemodeling and has been shown to find sparse solutions effectively. Theseapproaches can potentially be used in the wavelet domain since a pair ofcoefficients at a certain location and adjacent scales are typically both largeor both small in amplitude [29]. Tree-structure existing in the wavelet domainallows group-sparse models to be easily constructed and used. One of themajor contributions of our work is to investigate the use of Bayesian groupsparse modeling for wavelet-based regularization problems.In [26], we proposed a hierarchical Bayesian modeling of wavelet coef3

ficients derived from a group-sparse GSM model. Based on a combinationof VB inference with a subband-adaptive majorization minimization (MM)method, the VBMM method in [26] effectively simplifies computation ofthe posterior distribution and finds good solutions in the non-convex searchspace. In addition, the VBMM method has also shown good potential withgroup-sparse modeling. In [30], we incorporate the VBMM method with awavelet tree structure based on overlapped groups, which leads to an improved solution compared with unstructured coefficient-sparse modeling.In this paper, we extend the ideas from [26] to generalize the VBMMmethod and discuss the theoretical foundations in some detail. Differentfrom [26] and [30], we also include the results of image superresolution andMRI image reconstruction. The proposed method can handle very largedata sets with a good performance and low computation cost. The paperis organized as follows. Section 2 describes our proposed VBMM imagerestoration framework. Section 3 discusses the tree-structured extensionsof VBMM. Experimental results are shown in Section 4. Conclusions areprovided in Section 5.2. VBMM Image restorationIn this section, we describe our proposed VBMM Image restoration framework and its tree-structured extensions.2.1. Model FormulationsTo obtain a wavelet-based formulation, we note that the image x can berepresented by wavelet expansion as x Mw where w is a N 1 vectorrepresenting all wavelet coefficients, and M is the inverse wavelet transformwhose columns are the wavelet basis functions. In the case of an orthogonalbasis, M is a square orthogonal matrix, whereas for an over-complete dictionary (e.g. a tight frame), M has N columns and M rows, with N M [6].The linear model in (1) then becomesy BMw n(2)and the resulting likelihood of the data assuming Gaussian noise n can beshown to bep(y w, ν 2 ) 2πν 2 M2exp{ 41ky BMwk2 }2ν 2(3)

A GSM model is now employed to model the wavelet coefficients. Inspiredfrom [28], we adopt a model which incorporates group sparsity such that wi ,the ith group of w, follows a zero mean Gaussian distribution with an (asyet) unknown variance of σi2 per element. Therefore the conditional prior ofw can be expressed asp (w S) GY N wi 0, σi2 N w 0, S 1(4)i 1where wi is a vector of coefficients comprising the ith group of size gi , S is adiagonal matrix of size N N formed from the vector s of size G whose ithentry is si 1/σi2 , and G denotes the number of groups. The case G Ncorresponds to independent sparse modeling of the wavelet coefficients [28];whereas the case, G N/2 and gi 2 for all i, can be used to model the realand imaginary parts of G complex coefficients, each with a 2-D circularlysymmetric pdf. To be consistent with the following algebra, S needs to be ofsize N N and, when N G, its diagonal must be an expanded formPG of swhere each si appears gi times for the elements of group gi , and N i 1 gi .To proceed with Bayesian inference, the posterior distribution can becalculated via: p (y w, ν 2 ) p (w S)p w y, S, ν 2 p (y S, ν 2 )(5)Because both p (y w, ν 2 ) and p (w S) are Gaussian functions of w, theposterior distribution can be rearranged into a Gaussian form as p w y, S, ν 2 N (w µ, Σ)(6)where, from (3) and (4): 1Σ ν 2 MT BT BM Sµ ν 2TTΣM B y(7)(8)The computation of the posterior variance Σ requires inversion of the N Nsquare matrix (ν 2 MT BT BM S). This operation is not computationallyfeasible for large images and 3D datasets, as N is often 107 or more. Herewe adopt the MM technique from [31], together with the recent subbandadaptive MM from [32, 33] to derive our fast algorithm. The aim is to5

replace the troublesome (MT BT BM) term in (7) with the purely diagonalΛα , as in (16).To derive MM from a Bayesian viewpoint, we now introduce the followingapproximation model for the posterior distribution of w and the new hiddenvariable z: (9)p w, z y, S, ν 2 p (z w) p w y, S, ν 2wherep (z w) exp{ (w z)TΛα MT BT BM(w z)}2ν 2(10)Note that taking the negative logarithm of both sides of (9) will give asimilar surrogate function to that proposed in [32, 33]J α (w, z) J(w) (w z)TΛα MT BT BM(w z)2ν 2(11)where J α (w, z) ln p(w, z y, S, ν 2 ) and J(w) ln p(w y, S, ν 2 ) from(6). Λα is a diagonal matrix formed from a vector α whose elements αjmay be minimized independently for each subspace/subband j of M, suchthat Λα MT BT BM is just positive definite. This property ensures thatJ α (w, z) J(w) for any w 6 z, and J α (w, z) J(w) for w z [33], andhence produces monotonicity of the decay of J(w), since for tth iteration [6] J z(t 1) J α z(t 1) , z(t 1) J α z(t 1) , z(t) (12) J α z(t) , z(t) J z(t)The first inequality results from J α (w, z) J α (w, w) J(w) for any w(t)and z. The second inequality comes from the fact that J α w, zattains(t 1)its minimum for w zaccording to the definition of a MM iterativealgorithm [6]: z(t 1) arg min w, z(t)(13)wThe use of (11) is known as the subband-adaptive MM technique. Relatedalgorithms are called subband-adaptive iterative shrinkage/thresholding (SIST)6

algorithms [32, 33, 34]. In [33] and [34], fast algorithms for computing Λαare proposed. We refer the reader to Appendix 6.1 for more detailed discussion where we summarize ways of selecting Λα for both convolutional andnon-convolutional kernels. Because p (z w) N w z, ν 2 (Λα MT BT BM) 1 and p (w y, S, ν 2 ),given by (6), (7) and (8), are Gaussian functions of w, the approximationmodel p (w, z y, S, ν 2 ) is also a Gaussian distribution and, when z is given,we can rearrange (9) into the Gaussian form: p w y, z, S, ν 2 N w µ, Σ(14)withµ ν 2 Σ[(Λα MT BT BM)z MT BT y](15)Σ (ν 2 Λα S) 1(16) 1where Σ is now purely diagonal and easy to invert. This gives the subbandadaptive MM technique in a Bayesian framework as introduced in [26], whoseconvergence rate is improved by keeping the spectral radius of the matrix(Λα MT BT BM) small.In the above approximation model for p (w y, z, S, ν 2 ), it is required toestimate the inverse signal variance S since we do not know it yet. In [35],maximum a posteriori (MAP) estimation with an independent prior is usedto determine S. However, it is known that point estimates do not define muchof the available signal space, and better convergence is achieved if approximate distributions of the posterior density are used. In fact, VB inferencepossesses this property by providing a distribution that approximates theposterior distribution of the hidden variables [36], and it has been shown in[37] that VB inference can effectively smooth out local minima and help toensure that a near-global minimum solution is found. Compared with MAP,VB inference can be seen as a more principled approach, which should findimproved solutions to inverse problems.2.2. VB Continuation StrategyIn this section, we apply the VB approximation to derive the continuationstrategies of our model. To update the variables appearing in (9), we construct a 3-layer hierarchical prior as described in [26]. To be more specific,we impose a multivariate Gamma distribution for the inverse signal variance7

vector s (which is expanded into S for (4)) using shape parameter a and ratevector b:GYbai a 1s exp( bi si )p(s a, b) Γ(a) ii 1(17)and a further Gamma distribution for b:GYθk k 1bexp( θbi )p(b k, θ) Γ(k) ii 1(18)When shape k and rate θ both tend towards zero, the Gamma prior onp(b) tends to a noninformative Jeffreys prior [38], and therefore the mean ofsignal variance σ 2 approximately follows a noninformative prior. This meansthat the posterior depends only on the data and not the prior, which is knownto strongly promote sparse estimates [39]. If prior knowledge is availablefor the model, we can tune the hyperparameters so that the prior becomesmore informative. As a result, we can move between the informative andnoninformative prior flexibly using the 3-layer Gamma GSM model. Becauseit often leads to simple closed-form solutions, we will use it in this paperfor encouraging sparsity. However, there are many alternative choices for asparsity-favouring prior that can be used for the GSM such as MultivariateLaplace, Inverse Gaussian and Bessel function distributions [28].Here we keep the noise variance ν 2 β 1 as a user parameter in order tobe able to adjust the regularization strength. Note that although ν 2 can beestimated via Bayesian inference as discussed in Section 2.3, its estimate canbe inaccurate because of the difficulty of accurately separating broadbandsignal components from noise. The complete graphical model is shown inFig. 2. As a result, the posterior of hidden variables now becomesp(w, z, s, b, y β)p(y)p (y w, β) p (w S) p(z w)p(s a, b)p(b k, θ) p (y)p (w, z, s, b y, β) (19)where β ν 2 . Note that the exact Bayesian posterior of (19) cannot be calculated as the marginal likelihood p (y) is intractable [36]. To approximatethe posterior p(ξ y, β) where ξ {w, z, s, b}, we adopt the VB approximation, which provides a distribution q(ξ) to approximate p(ξ y, β) [28, 36].8

βazkBMywsSMbNGθFigure 2: The graphical model of linear regression with hierarchical priors. y and z areGaussian distributions, w is a GSM, s and b are Gamma distributions.To be specific, q(ξ) is determined by minimizing the Kullback-Leibler (KL)divergence between q(ξ) and p(ξ y).From the basic probabilistic theory, we can decompose the log marginalprobability using [40]ln p(y) L(q(ξ)) KL(q(ξ)kp(ξ y))(20)withZ p(ξ, y) dξq(ξ)Z p(ξ y) KL(q(ξ)kp(ξ y)) q(ξ) lndξq(ξ)L(q(ξ)) q(ξ) ln(21)(22)By rearranging (20), we getL(q(ξ)) ln p(y) KL(q(ξ)kp(ξ y))(23)When solving for the pdfs of the parameters in ξ, we want to build thejoint pdf q(ξ) which most closely matches the shape of p(ξ, y) for the giveny, i.e. which maximizes L(q(ξ)). This is equivalent to minimizing the KL) p(ξ y) 1 and both q(ξ) and p(ξ y) aredivergence [40]. Because ln( p(ξ y)q(ξ)q(ξ)valid pdfs over ξ, we get KL(q(ξ)kp(ξ y)) 0. Since ln p(y) is a constant,the maximum of L(q(ξ)) occurs when q(ξ) equals the posterior distributionp(ξ y), and then KL(q(ξ)kp(ξ y)) 0 and L(q(ξ)) ln p(y).This can be viewed as a generalization of the Expectation-Maximization(EM) algorithm such that the VB model only contains hidden variables andno parameters [36]. This methodology is also referred to as nonparametricdistribution estimation in statistics [41]. Although we can calculate p(ξ, y)9

from the model, the true posterior distribution p(ξ y) is intractable becausewe do not know p(y). This means that q(ξ) cannot be simply obtained.Therefore we consider a restricted family of distributions q(ξ) instead andminimize the KL divergence for each member of this family [40]. By usingthe mean-field approximation which assumes posterior independence betweendifferent variables [28], we can then factorize q(ξ) into disjoint groups, so thatq(ξ) DYqi (ξi )(24)i 1where D is the total number of disjoint groups. For instance, in VBMM,q(w), q(z), q(s) (and hence S) and q(b) are pdfs of disjoint groups fromq(ξ). The purpose is to find a set of qi (ξi ), i 1 . . . D, such that L(q(ξ)) ismaximized, so that q(ξ) p(ξ y).Based on this factorization, the distribution of each variable q(ω), ω ξ,which minimizes (22) can be optimized asln q(ω) hln p(ξ y)iq(ξ\ω) hln p(ξ, y)iq(ξ\ω) const(25)where h·iq denotes expectation over q and ξ \ ω means the set of ξ with ωremoved. The detailed proof is given in Appendix 6.2. By sequentially calculating q(ω) for each ω ξ in turn, we obtain the following updating rules.(i) Optimize ln q(w) using (5), (9) and (14)ln q(w) hln p(ξ, y)iq(z)q(s)q(b) const hln(p(y w, β)p(w S)p(z w))i const hln p(w z, y, β, S)i const1 1 1 wT Σ w wT Σ µ const2(26)This represents a multivariate Gaussian distribution q(w) with mean µ andcovariance Σ. Thus the mean of w µ, and hencew(t 1) hwi µ(t)10(27)

where µ(t) is computed using the current estimates of S(t) and z(t) as in (15)and (16):(t)Σ 1 βΛα S(t)(28)(t)µ(t) βΣ [Λα z(t) MT BT (BMz(t) y)](29)(ii) Optimize ln q(z) using (10)ln q(z) hln p(ξ, y)iq(w)q(s)q(b) const hln p(z w)i const1 zT Σz z zT Σz hwi const2(30)This represents a Gaussian distribution q(z) where Σz ν 2 (Λα MT BT BM) 1 .Thus provided (Λα MT BT BM) is positive definite, the mean of z occurswhenz(t 1) hwi w(t 1)(31)(iii) Optimize ln q(s) using (4) and (17)ln q(s) hln p(ξ, y)iq(w)q(z)q(b) const hln(p(w S)p(s a, b))i constGDX E1giln si si kwi k2 (a 1) ln si bi si const 22i 1 G Xi 1 gihkwi k2 ia 1 ln si hbi i si const22(32)This is the exponent of the product of G Gamma distributions [36]. Thusthe mean of si for i 1 . . . G, occurs when(t 1)si(t) hsi i gi 2a (t)(t)(t)kµi k2 tr[Σi ] 2bi(t)(t)(33)where µi and Σi are the components of µ(t) and Σ corresponding to(t)group wi , and bi hbi i at iteration t. Note that the mean of a Gamma11

Algorithm 1 VBMM-based image restoration algorithm1: Inputs: parameters for the sensing matrix B, observation y, Λα , a, k,θ, β, initial estimations of z(0) , s(0) and b(0) .2: while iterations t 0 : tmax or w has converged, do 3:4:5:6:7:8:9:Σ(t) βΛα S(t) 1(t)w(t 1) βΣ [Λα z(t) MT BT (BMz(t) y)]z(t 1) w(t 1)gi 2a(t 1) si for i 1.G(t)(t) 2(t)kµi k tr[Σi ] 2bia k(t 1) (t 1)bifor i 1.Gsi θend whileOutput restored image x Mwdistribution p(si a, bi ) is given by hsi i a.bi(iv) Optimize ln q(b) using (17) and (18)ln q(b) hln p(ξ, y)iq(w)q(z)q(s) const hln(p(s a, b)p(b k, θ))i constGDX E a ln bi bi si (k 1) ln bi θbi const i 1GX a k 1 ln bi hsi i θ bi const(34)i 1Thus each q(bi ) is a Gamma distribution and the mean of bi , for i 1 . . . G, occurs when(t 1)bi hbi i a k(t 1)si θ(35)The updating procedure can be viewed as minimizing the KL divergencewith respect to each of the factors qi (ξ)i in turn. Therefore we aim to get aclose approximation of q(ξ) to p(ξ y), within the limitations of the factorized12

form, (24). The key steps of VBMM-based image restoration algorithm areshown in Algorithm 1.Suppose after t iterations bi converges to a stationary point such that(t 1)(t)bi bi , then (35) can be substituted into (33), producing(t 1) 2(kwi(t 1) 2k ) si(t 1) 2where kwisolution as(t 1) 2 (θkwi(t 1) k 2k gi ) si (gi 2a)θ 0 (36)(t)(t)k (kµi k2 tr[Σi ]). When θ 0, (36) has a unique(t 1)si gi 2khkwi k2 i(37)It can be seen from (37) that as long as k g2i the proposed method approximates an l0 norm by reweighting an l2 norm iteratively which can beregarded as a relaxation of Iterative Reweighted Least Squares (IRLS) studied in [42]. It is also noted that if θ 0, the solution does not depend on a.However, in practice θ should be set just above zero in order for p(b k, θ) in(18) to be a proper pdf. Where additional prior knowledge exists, a, θ andk can be chosen appropriately.2.3. Estimation of Noise VarianceAs discussed, we can keep the inverse noise variance β as a user parameterto adjust the regularization strength. However for some applications, noisevariance is an unknown and potentially variable parameter, which must therefore be estimated. To estimate β, we first assume a Gamma distribution isimposed for inverse signal variance β (β ν 2 ), such thatp(β c, d) dc c 1β exp( βd)Γ(c)(38)If we apply the same VB continuation strategy, we obtain that:β (t 1) M 2c(t)ky BMµ(t) k2 tr[MT BT BMΣ ] 2d(39)Furthermore, if the value of Λα is properly set such that Λα MT BT BMis close to zero, the following approximation is obtained:(t)(t)tr[MT BT BMΣ ] tr[Λα Σ ]13(40)

Figure 3: Simple example of the non-overlapping transformation corresponding only tolevels 2 and 3 of the quadtree in Fig. 1(b), using the parent 1child grouping of Fig. 4(a),1δ 1 4 , (0, 1].2This assumption is likely to be valid if most of the energy of MT BT BMis compressed into the leading diagonal terms [35]. Accordingly, β can thenbe estimated as:β (t 1) M 2c(t)ky BMµ(t) k2 tr[Λα Σ ] 2d(41)3. Extension for Tree-structured ModelingTo achieve the goal of a fully overlapped group sparse solution, it is possible to incorporate the VBMM model with a wavelet tree structure as shown in[30]. Recent works have demonstrated that modeling wavelet parent-childrenrelationships can be viewed as overlapping group regularization [29, 43]. Inspired from [43], we adopt a transformation to a non-overlapping redundantspace ŵ Dw. To encourage the persistence of large/small coefficients14

across scales, where ŵ is a P 1 vector that forms groups of wavelet coefficients in the non-overlapping space, and the sparse transformation matrix Dindicates the presence or absence of correspondence between the overlappingand non-overlapping spaces. Unlike [30], in this paper we construct the Dmatrix to be a Parseval tight frame such that DT D I. A simple exampleof this non-overlapping redundant transformation is shown in Fig. 3.In this case, we set entries of the D matrix that correspond to the finestlevel coefficients as 1 and set entries that represent all parent coefficients as δ.Because the magnitudes of the wavelet coefficients tend to decay across scale[22], we further introduce a parameter (0, 1] to adjust the ratio betweenthe magnitudes of the parent coefficient and its replicated copies, 4 copiesfor “parent 1child” and 1 copy for “parent 4children”. The entries thatcorrespond to the replicated parent coefficients are δ . Note that here we1 1for “parent 4children”keep δ 1 4 2 for “parent 1child” and δ 1 2Tto ensure D is a Parseval tight frame so that D D I.As a result, the likelihood of the data in (3) can be extended to bep(y ŵ, ν 2 ) 2πν 2 M2exp{ 1ky BMDT ŵk2 }22ν(42)where ŵ Dw. Then we can model ŵ using a group sparse GSM modelsimilar to (4):p (ŵ S) GY N ŵi 0, σi2 N ŵ 0, Ŝ 1(43)i 1Now Ŝ needs to be of size P P , and when P G, its diagonal is anexpanded form of s where each ŝi is repeated gi times.However, how best to group coefficients is not clear and becomes an important question. Here, we consider two grouping schemes: “parent 1child”and “parent 4children” as illustrated in Fig. 4. In the case of “parent 1child”scheme, the parent coefficient is grouped separately with each child coefficient, whereas for “parent 4children” scheme the parent coefficient isgrouped with all 4 of its children. Note that in both cases, we group theroot-level coefficients individually since they do not have a parent.Based on Bayes’ rule, the posterior distribution can be then calculated15

1451674(a)567(b)Figure 4: Illustration of different grouping strategies: (a) parent 1child, (b) parent 4children. The root-level coefficients are grouped individually shown as the red rectangles which represent singleton groups.via: p (y ŵ, ν 2 ) p ŵ Ŝ (44)p ŵ y, Ŝ, ν 2 p y Ŝ, ν 2 Because both p (y ŵ, ν 2 ) and p ŵ Ŝ are Gaussian functions of ŵ, the posterior can be rearranged as p ŵ y, Ŝ, ν 2 N ŵ µ̂, Σ̂(45) withµ̂ ν 2 Σ̂DMT BT y(46) 1Σ̂ ν 2 DMT BT BMDT Ŝ(47) 2T TTHowever, now the P P square matrix ν DM B BMD Ŝ needsto be inverted, which is not computationally feasible for big data sets andimages. So, we again adopt the Bayesian MM framework as in (9). Here weintroduce a hidden vector z of size N as before and the following approximation model for its posterior distribution: 22p ŵ, z y, Ŝ, ν p (z ŵ) p ŵ y, Ŝ, ν(48)andp (z ŵ) exp{ (ŵ Dz)T Q(ŵ Dz)}(49)Λ̂α DMT BT BMDTQ 2ν 2(50)where16

Algorithm 2 Tree-structured VBMM Image Restoration AlgorithmInputs: parameters for the sensing matrix B, observation y, Λ̂α , a, β,k, θ, initial estimations of z(0) , ŝ(0) and b̂(0) .2: while iterations t 0 : tmax or z has converged, do ˆ (t) β Λ̂ Ŝ(t) 13:Σ1:α4:5:6:7:8:9:10:(t)ˆ [Λ̂ Dz(t) DMT BT (BMz(t) y)]µ̂ β Σα(t)(t 1)ŵ µ̂(t 1)z DT ŵ(t 1)gi 2a(t 1)for i 1.Gŝi (t) (t)(t)ˆ2kµ̂i k tr[Σi ] 2b̂ia k(t 1)for i 1.Gb̂i (t 1)ŝi θend whileOutput restored image x Mz(t 1)(t)We find that Q should be positive definite to ensure convergence andhence:Λ̂α DMT BT BMDT(51)which is equivalent to requiring that:DT Λ̂α D DT DMT BT BMDT D MT BT BM(52)where we assume Λα DT Λ̂α D. It is known that we need Λα MT BT BMin order to fulfill the condition. Λα can be found as before using the methodsdiscussed in Section 2.1 and Appendix 6.1. We can easily compute the Λ̃αonce Λα is found, as it is equivalent to applyingthe non-overlapping trans2formation to Λα . Because p (z ŵ) and p ŵ y, Ŝ, ν are Gaussian functionsof ŵ, when z is given (typically as a previous estimate for w), the approximation model can be rearranged into a Gaussian form as17

Table 1: BLUR, Noise Variance and BSNR (dB)Exp.12345BLURhijhij9 9 uniform9 9 uniform9 9 uniform 1/(1 i2 j 2 ), i, j 7, . . . , 7 1/(1 i2 j 2 ), i, j 7, . . . , 7ν2BSNR31.100.310.032820405031.8525.85 ˆp ŵ y, z, Ŝ, ν 2 N ŵ µ̂, Σ(53)ˆ Λ̂ Dz DMT BT (BMz y)]µ̂ ν 2 Σ[αˆ (ν 2 Λ̂ Ŝ) 1Σ(54)withα(55)ˆ is diagonal. ThenNow (54) and (55) are computationally tractable, since Σwe can obtain the tree-structured VBMM Algorithm in Algorithm 2. Thisuses the same VB continuation strategy as described in Section 2.2 where weimpose Gamma distributions for both inverse signal variance ŝ and its rateparameter b̂:GYb̂ai a 1ŝ exp( b̂i ŝi )p(ŝ a, b̂) Γ(a) ii 1(56)GYθk k 1b̂i exp( θb̂i )Γ(k)i 1(57)andp(b̂ k, θ) This extension effectively provides a framework which incorporates awavelet tree structure in a variational Bayesian derivation. Note that Gnow represents the number of (overlapping) groups of complexPGcoefficients,and the gi are in general larger than before so that now P i 1 gi .4. ResultsWe present a set of experiments to evaluate the performance of theseproposed image restoration algorithms. Three applications including imagedeconvolution, image superresolution and compressive sensing (CS) magneticresonance imaging (MRI) reconstruction are studied. We have used both18

the coefficient-sparse VBMM (VC) in Algorithm 1 and the tree-structuredVBMM in Algorithm 2 with “parent 1child” grouping (V1) and “parent 4children” grouping (V4) for all these experiments. The DT CWT is chosenas our redundant sparsifying transform because it has good sparsity inducingproperties. Since DT CWT produces complex coefficients with circularlysymmetric pdfs, we assume a pair of real and imaginary coefficients sharethe same variance and can be clustered into one group. As a result, we have 6groups for V1 and G P10groups for V4,G N2 groups for VC, G P 64MNwhere P 4(M N 4lev ) for both V1 and V4. In all experiments, we setthe level of decomposition lev 4. In all experiments, we set 1 in the Dmatrix for simplicity. However, as stated in Section 3, varying may furtherimprove the results.4.1. Image Deconvolution(a)(b)(c)(d)(e)(f)(g)(h)Figure 5: Deconvolution results after 200 iterations on Exp. 2, on Cameraman, BSNR: 40dB. (a) Original. (b) Blurred. (c) Wiener filter, ISNR 4.512 dB. (d) MLTL, ISNR 7.594dB. (e) MSIST, ISNR 7.648 dB. (f ) VBMM–VC, ISNR 8.127 dB. (g) VBMM–V1,ISNR 8.221 dB. (h) VBMM–V4, ISNR 8.318 dB.For image deconvolution, the linear operator B becomes a convolution19

matrix. We compare and show that VBMM and its tree-structured extensions outperform recently developed SIST algorithms including multilevelthresholded Landweber (MLTL) [32] and modified subband-adaptive iterative shrinkage/thresholding (MSIST) [34]. Here we assume B is known but itis clear that Bayesian inspired schemes can also

work based on a group-sparse Gaussian scale mixture model. A hierarchical Bayesian estimation is derived using a combination of variational Bayesian inference and a subband-adaptive majorization-minimization method that simpli es computation of the posterior distribution. We show that both of these iterative methods can converge together .

Related Documents:

Agenda 1 Variational Principle in Statics 2 Variational Principle in Statics under Constraints 3 Variational Principle in Dynamics 4 Variational Principle in Dynamics under Constraints Shinichi Hirai (Dept. Robotics, Ritsumeikan Univ.)Analytical Mechanics: Variational Principles 2 / 69

Image restoration techniques are used to make the corrupted image as similar as that of the original image. Figure.3 shows classification of restoration techniques. Basically, restoration techniques are classified into blind restoration techniques and non-blind restoration techniques [15]. Non-blind restoration techniques

Variational Bayesian Linear Dynamical Systems 5.1 Introduction This chapter is concerned with the variational Bayesian treatment of Linear Dynamical Systems (LDSs), also known as linear-Gaussian state-space models (SSMs). These models are widely used in the fields of signal filtering, prediction and control, because: (1) many systems of inter-

Corrections, Image Restoration, etc. the image processing world to restore images [25]. Fig 1. Image Processing Technique II. TECHNIQUES AND METHODS A. Image Restoration Image Restoration is the process of obtaining the original image from the degraded image given the knowledge of the degrading factors. Digital image restoration is a field of

II. VARIATIONAL PRINCIPLES IN CONTINUUM MECHANICS 4. Introduction 12 5. The Self-Adjointness Condition of Vainberg 18 6. A Variational Formulation of In viscid Fluid Mechanics . . 25 7. Variational Principles for Ross by Waves in a Shallow Basin and in the "13-P.lane" Model . 37 8. The Variational Formulation of a Plasma . 9.

Image restoration techniques are used to make the degraded or corrupted image as similar as that of the original image. Fig. 1 shows classification of restoration techniques. Basically, restoration techniques are classified into blind restoration techniques and non-blind restoration technique. Non-blind restoration techniques are further

The Variational Bayesian Framework Variational Free Energy Optimization Tech. Mean Field Approximation Exponential Family Bayesian Networks Example: VB fo

Albert Woodfox were properly convicted for the 1972 murder of prison guard Brent Miller. Supporters of Wallace and Woodfox, who make up two-thirds of a group known to supporters as the "Angola Three," say that the convictions were at least partly because of the men's involvement with the Black Panther Party. "Under this new governor's office, this new day, we are making sure we right the .