Structured Matrix Estimation Via Approximate Message Passing

1y ago
26 Views
2 Downloads
992.90 KB
19 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Aarya Seiber
Transcription

Structured Matrix Estimation via Approximate Message Passing Phil Schniter and Jason Parker With support from NSF CCF-1218754, NSF CCF-1527162, and an AFOSR Lab Task (under Dr. Arje Nachman). ITA Workshop (San Diego) — Feb 5, 2016

The Generalized Bilinear Model Bilinear model: Infer b RNb and c RNc from bilinear measurements ym bT Φm c wm , m 1.M, where {Φm } are known matrices and {wm } are independent noise samples. Generalized bilinear model: Infer b and c from ym f bT Φm c wm , m 1.M, where f (·) is possibly non-linear (e.g., quantization, loss of phase). Schniter & Parker (OSU) BiG-AMP ITA’16 2 / 14

Some Applications of the Generalized Bilinear Model 1 Self-Calibration Observe Y Diag(b)ΨC W with known dictionary Ψ. Recover calibration parameters b and sparse signal coefficients C. 2 Blind Deconvolution Observe Y Conv(Φb)ΨC W with known dictionaries Φ and Ψ. Recover filter parameters b and sparse signal coefficients C. 3 Joint channel-symbol estimation Observe Y Conv(b)C W . Recover sparse channel coefficients b and coded finite-alphabet symbols C. 4 Recovery of low-rank plus sparse matrix Observe ym tr{ΦT m (L S)} wm with known Φm for m 1.M . Recover low-rank matrix L and sparse matrix S. 5 Nonlinear Compressed Sensing with Structured Matrix Uncertainty P Observe y f ( i bi Φi )c w with known {Φi } and componentwise f (·). Recover sparse vector c. 6 and many more . Schniter & Parker (OSU) BiG-AMP ITA’16 3 / 14

Extends Earlier “BiG-AMP” for Matrix Factorization 1 Matrix Completion: Recover low-rank matrix AX from noise-corrupted incomplete observations Y PΩ AX W . 2 Robust PCA: Recover low-rank matrix AX and sparse matrix S from noise-corrupted observations Y AX (S W ) [A I] [ X S ] W. 3 Dictionary Learning: Recover dictionary A and sparse matrix X from noise-corrupted observations Y AX W . 4 Non-negative Matrix Factorization: Recover non-negative matrices A and X from noise-corrupted observations Y AX W . A detailed numerical comparison1 against state-of-the-art algorithms suggests BiG-AMP gives excellent phase transitions, BiG-AMP gives competitive runtimes. 1 Parker,Schniter,Cevher, Schniter & Parker (OSU) IEEE-TSP’14 BiG-AMP ITA’16 4 / 14

Parametric Bilinear Generalized AMP (PBiG-AMP) Separable probabilistic model: Recall ym f ( bT Φm c wm ), m 1.M. {z } , zm Q Q Q Treat b i pb (bi ), c j pc (cj ), and y m py z (ym zm ). Treat Φ , {Φ1 , ., ΦM } as i.i.d. Gaussian and consider large-system limit: M, Nb , Nc with Nb /M and Nc /M converging to fixed constants. Sum-product algorithm simplifies: Messages become Gaussian Number of messages reduces from 2M (Nb Nc ) to 2(M Nb Nc ) “Approximate Message Passing” [Donoho,Maleki,Montanari’09],[Rangan’10] Schniter & Parker (OSU) BiG-AMP ITA’16 5 / 14

PBiG-AMP: Density Evolution Collaborators Christophe Schülke and Lenka Zdeborova showed that. . . For i.i.d. Gaussian Φ in the large-system limit, PBiG-AMP is characterized by a scalar density evolution. Example: Gaussian py z Bernoulli-Gaussian pb pc N b Nc , N Density evolution predicts the phase transition: where ρ: sparsity ratio , K/N α: sampling rate , M/(2N ) ·: counting bound Schniter & Parker (OSU) BiG-AMP ITA’16 6 / 14

PBiG-AMP: Implementation & Extensions Our implementation (https://sourceforge.net/projects/gampmatlab/) allows. non-identical priors {pbi }, {pcj } and likelihood {pym zm } complex-valued quantities fast implementations of generic Φm (e.g., FFTs, sparse, etc.). Prior/likelihood parameters (e.g., sparsity, noise variance) can be tuned online using the same expectation maximization (EM) methods proposed for AMP [Schniter/Vila’11]. Can relax the separability assumptions Q Q b i pbi (bi ), c j pcj (bj ), y z Q ym zm (ym zm ) by embedding PBiG-AMP in a larger factor graph (i.e., “turbo-AMP”) [Schniter’10]. Schniter & Parker (OSU) BiG-AMP ITA’16 7 / 14

Example 1: Empirical Phase Transition under iid N Φ PBiG-AMP EM-PBiG-AMP 1 100 1 100 0.9 0.9 90 90 0.8 0.8 80 80 70 0.6 60 0.5 50 0.4 40 0.7 sparsity K sparsity K 0.7 70 0.6 60 0.5 50 0.4 40 0.3 30 0.3 30 0.2 20 0.2 20 0.1 10 0.1 10 0 40 60 80 100 120 140 160 180 200 220 240 0 40 measurements M 60 80 100 120 140 160 180 200 220 240 measurements M Measure noiseless ym bT Φm c for m 1.M with b, c R100 Recover iid: bi , cj BG(K/100, 0, 1) Known iid Gaussian Φm Phase transition (NMSE 10 6 ) close to counting bound (red line). Due to finite-size effects, empirical performance beats the density evolution! Schniter & Parker (OSU) BiG-AMP ITA’16 8 / 14

Example 2: Self-Calibration EM-PBiG-AMP SparseLift 1 40 1 40 35 0.8 30 0.7 25 0.6 20 0.5 0.4 15 0.3 10 0.2 5 0 0 10 20 30 40 50 60 0.9 # of parameters Nb # of parameters Nb 0.9 35 0.7 25 0.6 20 0.5 0.4 15 0.3 10 0.1 5 0 0 signal sparsity K 0.8 30 0.2 0.1 0 10 20 30 40 50 60 0 signal sparsity K Measure noiseless y D(F b)Φc Recover unknown Gaussian b RNb , K-sparse c R256 . Known DFT F C128 Nb and i.i.d. Gaussian Φ R128 256 M & O(Nb K) measurements suffice for EM-PBiG-AMP M & O(Nb K) are needed for SparseLift from [Ling/Strohmer’14]. Schniter & Parker (OSU) BiG-AMP ITA’16 9 / 14

Example 3: Matrix Compressive Sensing CPCP via TFOCS EM-PBiG-AMP 0.5 0.9 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 10 15 20 rank R 25 30 0.1 M 8000 measurements 0 5 10 15 20 rank R 25 30 M 8000 measurements 1 0.7 0.35 0.9 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 sparsity rate ξ sparsity rate ξ 1 0.45 0.8 0.4 0.1 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.1 0.05 0.05 5 10 15 20 rank R 25 30 M 10000 measurements 0 5 10 15 20 rank R 25 30 M 10000 measurements 1 0.5 0 1 0.5 0.9 0.9 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 sparsity rate ξ sparsity rate ξ 0 0.5 0.9 0.45 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0.1 0.05 5 Schniter & Parker (OSU) 0.7 0.35 0.05 5 [Wright/Ganesh/Min/Ma’13] PBiG-AMP runs significantly faster than CPCP via TFOCS. 0.8 0.4 0.1 0.05 0.5 EM-PBiG-AMP outperforms convex relaxation known as Compressive Principal Components Pursuit 1 0.5 0.9 sparsity rate ξ Recover sparse S and rank-R L R100 100 M 5000 measurements 1 0.45 sparsity rate ξ For m 1.M , measure 5000 (L S)} ym tr{ΦT m with 50-sparse iid Φm . M 5000 measurements 10 15 20 rank R BiG-AMP 25 30 0 5 10 15 20 rank R 25 30 0 ITA’16 10 / 14

Example 4: Totally Blind Deconvolution Schniter & Parker (OSU) Gaussian symbs QPSK syms 10 0 0 0 -5 -10 10 -1 -10 -15 -30 -40 10 -2 -20 SER(ĉ) -20 NMSE(ĉ) [dB] – Recover both bm and cm . – Zero-valued guards ensure identifiability [Manton’03]. PBiG-AMP outperforms blind Cross Relation method [Hua’96] With QPSK at high SNR, PBiG-AMP achieves oracle performance. Gaussian channel NMSE(b̂) [dB] Measure linear convolution outputs y m bm c m w m . -25 -30 -35 10 -4 -40 -50 cross-relation Gauss -45 PBiG-AMP Gauss oracle Gauss -60 -50 0 10 20 30 40 50 0 10 20 30 40 50 SNR [dB] SNR [dB] BiG-AMP 10 -3 cross-relation QPSK PBiG-AMP QPSK oracle QPSK 10 -5 0 10 20 30 40 50 SNR [dB] ITA’16 11 / 14

Example 5: CS with Structured Matrix Uncertainty 10 0 PBiG-AMP EM-PBiG-AMP WSS-TLS c-Oracle 0 PBiG-AMP EM-PBiG-AMP WSS-TLS b,support-Oracle 10 NMSE (c) NMSE (b) 10 20 30 20 30 40 40 50 50 60 0 0.2 0.4 M/Nc 0.6 0.8 1 60 0 0.2 0.4 M/Nc 0.6 0.8 1 P10 Measure: y Φ0 i 1 bi Φi c w, (Nc 256, SNR 40dB) Recover iid: wm N (0, ν w ), bi N (0, 1), cj BG(0.04, 0, 1) Known iid: [Φ0 ]mj N (0, 10), [Φi ]mj N (0, 1) EM-PBiG-AMP outperforms oracle-tuned WSS-TLS Schniter & Parker (OSU) BiG-AMP [Zhu/Leus/Giannakis’11] ITA’16 12 / 14

Summary Proposed an AMP algorithm for the generalized bilinear model. Assumes unknown independent random vectors b and c are related to observations {ym } through a conditionally independent likelihood of the form p(ym bT Φm c) with large i.i.d. Gaussian Φm . Behavior characterized by a scalar density evolution. Generalizes previous work on matrix-factorization AMP. Numerical experiments demonstrate performance near oracle bounds for various interesting Φ that are not i.i.d. Gaussian. Full paper can be found at http://arxiv.org/abs/1508.07575. Schniter & Parker (OSU) BiG-AMP ITA’16 13 / 14

References 1 D.L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing,” PNAS, 2009. 2 S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” ISIT, 2011. (See also arXiv:1010.5141). 3 J. T. Parker, P. Schniter, and V. Cevher, “Bilinear Generalized Message Passing—Part 1: Derivation,” IEEE Trans. Signal Process., 2014. 4 J. T. Parker, P. Schniter, and V. Cevher, “Bilinear Generalized Message Passing—Part 2: Applications,” IEEE Trans. Signal Process., 2014. 5 J. T. Parker and P. Schniter, “Parametric Bilinear Generalized Approximate Message Passing,” http://arxiv.org/abs/1508.07575, 2015. 6 J. P. Vila and P. Schniter, “Expectation-Maximization Gaussian-Mixture Approximate Message Passing,” IEEE Trans. Signal Process., 2013. 7 P. Schniter, “Turbo reconstruction of structured sparse signals,” Proc. Conf. Inform. Science & Syst., 2010. 8 S. Ling and T. Strohmer, “Self-calibration and biconvex compressive sensing,” Inverse Problems, 2015. 9 H. Zhu, G. Leus, and G. Giannakis, "Sparsity-Cognizant Total Least-Squares for Perturbed Compressive Sampling," IEEE Trans. Signal Process., 2011. 10 J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal component pursuit,” Inform. Inference, 2013. 11 J. H. Manton and W. D. Neumann, “Totally blind channel identication by exploiting guard intervals,” Syst. Control Lett., 2003. 12 Y. Hua, “Fast maximum likelihood for blind identification of multiple FIR channels,” IEEE Trans. Signal Process., 1996. Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

Thanks for listening! Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

Matrix Completion: Phase Transitions The following plots show empirical probability that NMSE 100 dB (over 10 realizations) for noiseless completion of an M L matrix with M L 1000. IALM VSBL 100 1 100 70 0.6 60 50 0.4 40 0.8 80 30 70 0.6 60 50 0.4 40 20 0.1 0.15 0.2 0.25 0 0.1 0.15 0.2 0.25 0 1 50 0.4 40 30 0.8 70 0.6 60 50 0.4 40 30 0.2 20 10 0.2 0.25 sampling ratio Ω /M L 0 1 0.8 80 70 0.6 60 50 0.4 40 30 0.2 20 10 0 0.25 100 rank N 0.6 60 0.2 90 80 rank N 70 0.15 EM BiG AMP 100 0.8 80 0.1 sampling ratio Ω /M L 90 0.15 0.2 0.05 BiG AMP Lite 1 90 0.1 0.4 40 sampling ratio Ω /M L LMaFit 100 0.05 50 10 0.05 sampling ratio Ω /M L 0.6 60 20 10 0.05 70 30 0.2 20 10 0.8 80 30 0.2 1 90 rank N 0.8 rank N rank N 1 90 80 rank N GROUSE 100 90 0.2 20 10 0.05 0.1 0.15 0.2 0.25 sampling ratio Ω /M L 0 0.05 0.1 0.15 0.2 0.25 0 sampling ratio Ω /M L Note that BiG-AMP-Lite and EM-BiG-AMP have the best phase transitions. Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

Matrix Completion: Runtime to NMSE -100 dB sampling ratio Ω /M L 0.05 sampling ratio Ω /M L 0.2 3 3 10 2 runtime (sec) runtime (sec) 10 10 1 10 0 1 10 Matrix ALPS GROUSE VSBL LMaFit IALM BiG AMP Lite BiG AMP EM BiG AMP 0 10 10 1 10 2 10 1 5 10 15 20 10 rank N 0 20 40 60 80 100 rank N Although LMaFit is the fastest algorithm at small rank N , BiG-AMP-Lite’s superior complexity-scaling-with-N eventually wins out. BiG-AMP runs 1 to 2 orders-of-magnitude faster than IALM and VSBL. Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

Robust PCA: Phase Transitions Empirical probability of NMSE 80 dB over 10 realizations for noiseless recovery of the low-rank component of a 200 200 outlier-corrupted matrix. VSBL GRASTA IALM 1 0.9 1 90 1 90 90 80 80 0.8 80 0.8 0.7 0.5 50 0.4 40 0.3 30 0.6 50 0.4 40 30 60 0.6 50 0.4 40 30 0.2 20 0.8 70 60 rank N 70 0.6 60 rank N rank N 70 0.2 20 0.2 20 0.1 10 10 0.1 0.2 0.3 0.4 10 0.1 outlier fraction λ 0.2 0.3 0.4 0.1 outlier fraction λ LMaFit 80 0.8 0.8 0.4 40 30 0.6 50 0.4 40 30 0.2 20 10 0.1 0.2 0.3 0.4 outlier fraction λ 60 0.6 50 0.4 40 30 0.2 20 10 0 0.8 70 60 rank N 70 rank N 70 0 1 90 80 50 0.4 EM BiG AMP 2 90 80 0.6 0.3 1 90 60 0.2 outlier fraction λ BiG AMP 2 1 rank N 0 0.2 20 10 0.1 0.2 0.3 0.4 outlier fraction λ 0 0.1 0.2 0.3 0.4 0 outlier fraction λ As before, the BiG-AMP methods yield the best phase transitions. Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

Overcomplete Dictionary Recovery: Phase Transitions Mean NMSE over 50 realizations for recovery of an M (2M ) dictionary from L 10M log(2M ) examples with sparsity K: K-SVD SPAMS EM-BiG-AMP 0 0 10 9 sparsity K NOISELESS 10 10 8 10 8 20 7 6 6 40 3 1 20 30 40 50 60 4 40 20 30 40 50 60 sparsity K NOISY 60 10 8 6 9 10 6 40 3 50 1 20 30 40 50 60 dictionary rows M 60 60 60 0 9 10 20 7 30 5 4 40 4 40 3 50 2 1 10 50 6 3 2 40 30 5 4 30 8 20 7 30 5 20 10 8 20 7 50 10 0 10 9 40 2 1 10 0 10 4 3 50 2 60 30 5 1 10 20 6 3 50 2 10 7 30 5 4 9 8 20 7 30 5 0 10 9 50 2 1 10 20 30 40 50 60 dictionary rows M 60 10 20 30 40 50 60 60 dictionary rows M As before, the BiG-AMP methods yield the best phase transitions. Schniter & Parker (OSU) BiG-AMP ITA’16 14 / 14

The Generalized Bilinear Model Bilinear model: Infer b RNb and c RNc from bilinear measurements y m b TΦ mc w m, m 1.M, where {Φ m} are known matrices and {w m} are independent noise samples. Generalized bilinear model: Infer band cfrom y m f bTΦ mc w m, m 1.M, where f(·)is possibly non-linear (e.g., quantization, loss of phase).

Related Documents:

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The square root of a matrix (if unique), not elementwise

A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not .

CONTENTS CONTENTS Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not elementwise

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The sq

Further Maths Matrix Summary 1 Further Maths Matrix Summary A matrix is a rectangular array of numbers arranged in rows and columns. The numbers in a matrix are called the elements of the matrix. The order of a matrix is the number of rows and columns in the matrix. Example 1 [is a ] 3 by 2 or matrix as it has 3 rows and 2 columns. Matrices are .

3.1 Otolith imaging via CT scanning 5 3.2 Age estimation via machine learning 6 4. RESULTS 8 4.1 Otolith imaging via CT scanning 8 4.2 Age estimation via machine learning 13 5. DISCUSSION 15 5.1 Otolith imaging via CT scanning 15 5.2 Age estimation via machine learning 16 5.3 Potential for applying machine learning to otolith CT im ages 17 6.

A spreadsheet template for Three Point Estimation is available together with a Worked Example illustrating how the template is used in practice. Estimation Technique 2 - Base and Contingency Estimation Base and Contingency is an alternative estimation technique to Three Point Estimation. It is less

Introduction The EKF has been applied extensively to the field of non-linear estimation. General applicationareasmaybe divided into state-estimation and machine learning. We further di-vide machine learning into parameter estimation and dual estimation. The framework for these areas are briefly re-viewed next. State-estimation