2m ago

1 Views

0 Downloads

6.08 MB

80 Pages

Transcription

Bilinear Prediction Using Low-Rank ModelsInderjit S. DhillonDept of Computer ScienceUT Austin26th International Conference on Algorithmic Learning TheoryBanff, CanadaOct 6, 2015Joint work with C-J. Hsieh, P. Jain, N. Natarajan, H. Yu and K. ZhongInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

OutlineMulti-Target PredictionFeatures on Targets: Bilinear PredictionInductive Matrix Completion1Algorithms2Positive-Unlabeled Matrix Completion3Recovery GuaranteesExperimental ResultsInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Sample Prediction ProblemsPredicting stock pricesPredicting risk factors in healthcareInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

RegressionReal-valued responses (target) tPredict response for given input data (features) aInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Linear RegressionEstimate target by a linear function of given data a, i.e. t t̂ aT x.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Linear Regression: Least SquaresChoose x that minimizesnJx 1X T(ai x ti )22i 1Closed-form solution:x (AT A) 1 AT t.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Prediction Problems: ClassificationSpam detectionCharacter RecognitionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Binary ClassificationCategorical responses (target) tPredict response for given input data (features) aLinear methods — decision boundary is a linear surface or hyperplaneInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Linear Methods for Prediction ProblemsRegression:P22Ridge Regression: Jx 12 ni 1 (aTi x ti ) λkxk2 .P2Lasso: Jx 12 ni 1 (aTi x ti ) λkxk1 .Classification:Linear Support Vector MachinesnJx 1X2max(0, 1 ti aTi x) λkxk2 .2i 1Logistic RegressionnJx 1X2log(1 exp( ti aTi x)) λkxk2 .2i 1Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Linear PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Linear PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-Target PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningAd-word RecommendationInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningAd-word Recommendationgeico auto insurancegeico car insurancecar insurancegeico insuranceneed cheap auto insurancegeico comcar insurance coupon codeInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningWikipedia Tag RecommendationLearning in computervisionMachine learningLearningCyberneticsInderjit S. Dhillon Dept of Computer Science UT Austin.Low-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningPredicting causal disease genesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Prediction with Multiple TargetsIn many domains, goal is to simultaneously predict multiple targetvariablesMulti-target regression: targets are real-valuedMulti-label classification:targets are binaryInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Prediction with Multiple TargetsApplicationsBid word recommendationTag recommendationDisease-gene linkage predictionMedical diagnosesEcological modeling.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Prediction with Multiple Targets(1)(2)(m)Input data ai is associated with m targets, ti (ti , ti , . . . , tiInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction)

Multi-target Linear PredictionBasic model: Treat targets independentlyEstimate regression coefficients xj for each target jInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear PredictionAssume targets t(j) are independentLinear predictive model: ti aTi XInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear PredictionAssume targets t(j) are independentLinear predictive model: ti aTi XMulti-target regression problem has a closed-form solution: 2VA Σ 1A UA T arg min kT AX kFXwhere A UA ΣA VAT is the thin SVD of AInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear PredictionAssume targets t(j) are independentLinear predictive model: ti aTi XMulti-target regression problem has a closed-form solution: 2VA Σ 1A UA T arg min kT AX kFXwhere A UA ΣA VAT is the thin SVD of AIn multi-label classification: Binary Relevance (independent binary classifierfor each label)Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear Prediction: Low-rank ModelExploit correlations between targets T , where T AXReduced-Rank Regression [A.J. Izenman, 1974] — model thecoefficient matrix X as low-rankA. J. Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis 5.2 (1975): 248-264.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear Prediction: Low-rank ModelX is rank-kLinear predictive model: ti aTi XInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Linear Prediction: Low-rank ModelX is rank-kLinear predictive model: ti aTi XLow-rank multi-target regression problem has a closed-form solution:X minX :rank(X ) kkT AX k2F( VA Σ 1A UA Tk VA Σ 1A Mkif A is full row rank,otherwise,where A UA ΣA VAT is the thin SVD of A, M UA T , and Tk , Mk arethe best rank-k approximations of T and M respectively.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern ChallengesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction with Missing ValuesIn many applications, several observations (targets) may be missingE.g. Recommending tags for images and wikipedia articlesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningAd-word Recommendationgeico auto insurancegeico car insurancecar insurancegeico insuranceneed cheap auto insurancegeico comcar insurance coupon codeInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction with Missing ValuesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction with Missing ValuesLow-rank model: ti aTi X where X is low-rankInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Canonical Correlation AnalysisInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionAugment multi-target prediction with features on targets as wellMotivated by modern applications of machine learning —bioinformatics, auto-tagging articlesNeed to model dyadic or pairwise interactionsMove from linear models to bilinear models — linear in input featuresas well as target featuresInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionBilinear predictive model: Tij aTi X bjInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear PredictionBilinear predictive model: Tij aTi X bjCorresponding regression problem has a closed-form solution: 1 T 2VA Σ 1A UA TUB ΣB VB arg min kT AXB kFXwhere A UA ΣA VA , B UB ΣB VB are the thin SVDs of A and BInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear Prediction: Low-rank ModelX is rank-kBilinear predictive model: Tij aTi X bjInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear Prediction: Low-rank ModelX is rank-kBilinear predictive model: Tij aTi X bjCorresponding regression problem has a closed-form solution:X minX :rank(X ) kkT AXB k2F( 1 T VA Σ 1A UA Tk UB ΣB VB 1 TVA Σ 1A Mk ΣB VBif A, B are full row rank,otherwise,where A UA ΣA VA , B UB ΣB VB are the thin SVDs of A and B,M UA TUB , and Tk , Mk are the best rank-k approximations of Tand MInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Challenges in Multi-Target PredictionMillions of targetsCorrelations among targetsMissing valuesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Challenges in Multi-Target PredictionMillions of targets (Scalable)Correlations among targets (Low-rank)Missing values (Inductive Matrix Completion)Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear Prediction with Missing ValuesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Matrix CompletionMissing Value Estimation ProblemMatrix Completion: Recover a low-rank matrix from observed entriesMatrix Completion: exact recovery requires O(kn log2 (n)) samples,under the assumptions of:12Uniform samplingIncoherenceInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Inductive Matrix CompletionInductive Matrix Completion: Bilinear low-rank prediction with missingvaluesDegrees of freedom in X are O(kd)Can we get better sample complexity (than O(kn))?Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 1: Convex Relaxation1Nuclear-norm Minimization:min kX k s.t. aTi X bj Tij , (i, j) ΩComputationally expensiveSample complexity for exact recovery: O(kd log d log n)Conditions for exact recovery:C1. Incoherence on A, B.C2. Incoherence on AU and BV ,where X U Σ V T is the SVD ofthe ground truth X C1 and C2 are satisfied with high probability when A, B are GaussianInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 1: Convex RelaxationTheorem (Recovery Guarantees for Nuclear-norm Minimization)Let X U Σ V T Rd d be the SVD of X with rank k. Assume A, B areorthonormal matrices w.l.o.g., satisfying the incoherence conditions. Then ifΩ is uniformly observed with Ω O(kd log d log n),the solution of nuclear-norm minimization problem is unique and equal toX with high probability.The incoherence conditions areµdµd, max kbj k22 n j [n]nµ0 kµ0 k2TTC2. max kU ai k2 , max kV bj k22 j [n]i [n]nnC1. max kai k22 i [n]Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresAlternating Least Squares (ALS):minY Rd1 k Z Rd2 kXT2(aTi YZ bj Tij )(i,j) ΩNon-convex optimizationAlternately minimize w.r.t. Y and ZInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresComputational complexity of ALS.At h-th iteration, fixing Yh , solve the least squares problem for Zh 1 :XXTT(ãTTij bj ãTi Zh 1 bj )bj ãi i(i,j) Ω(i,j) Ωwhere ãi YhT ai . Similarly solve for Yh when fixing Zh .123Closed form: O( Ω k 2 d (nnz(A) nnz(B))/n k 3 d 3 ).Vanilla conjugate gradient: O( Ω k (nnz(A) nnz(B))/n) per iteration.Exploit the structure for conjugate gradient:X T T(ãi Z bj )bj ãTi B T D Ã(i,j) Ωwhere D is a sparse matrix with Dji ãTi Z T bj , (i, j) Ω, and Ã AYh .Only O((nnz(A) nnz(B) Ω ) k) per iteration.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresTheorem (Convergence Guarantees for ALS )Let X be a rank-k matrix with condition number β, and T AX B T .Assume A, B are orthogonal w.l.o.g. and satisfy the incoherence conditions.Then if Ω is uniformly sampled with Ω O(k 4 β 2 d log d),Tthen after H iterations of ALS, kYH ZH 1 X k2 , whereH O(log(kX kF / )).The incoherence conditions are:µdµd, max kb j k22 n j [n]nµ0 kµ0 k2TTC2’. max kYh ai k2 , max kZh b j k22 ,j [n]i [n]nnC1. max kai k22 i [n]for all the Yh ’s and Zh ’s generated from ALS.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresProof sketch for ALSConsider the case when the rank k 1:XT2min(aTi y z bj Tij )y Rd1 ,z Rd2Inderjit S. Dhillon Dept of Computer Science UT Austin(i,j) ΩLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresProof sketch for rank-1 ALSminy Rd1 ,z Rd2XT2(aTi y z bj Tij )(i,j) Ω(a) Let X σ y z T be the thin SVD of X and assume A and B areorthogonal w.l.o.g.(b) In the absence of missing values, ALS Power method. kAy h z T B T T k2FTT2T T 2B T (Bzy Th A T )Ay h 2(zky h k B T Ay h ) zz h 1 (AT TB)T y h ; normalize z h 1y h 1 (AT TB)z h 1 ; normalize y h 1Note that AT TB AT AX B T B X and the power method convergesto the optimal.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresProof sketch for rank-1 ALSminXy Rd1 ,z Rd2T2(aTi y z bj Tij )(i,j) Ω(c) With missing values, ALS is a variant of power method with noise ineach iterationz h 1 QR( X T y h {z }power methodwhere N P(i,j) Ω σ N 1 ((y T y h )N Ñ)z ) {z}noise term gTTbj aTi y h y h ai bj , Ñ TTT(i,j) Ω bj ai y h y ai bj .σ N 1 ((y T y h )N Ñ)z P(d) Given C1 and C2’, the noise term g becomes smaller as the iterate gets close to the optimal:q121 (y Tkgk2 h y )99Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresProof sketch for rank-1 ALSminy Rd1 ,z Rd2XT2(aTi y z bj Tij )(i,j) Ω(e) Given C1 and C2’, the first iterate y 0 is well initialized, i.e. y T0 y 0.9,which guarantees the initial noise is small enough(f) The iterates can then be shown to linearly converge to the optimal:12(1 (y Th z ) )21T221 (y Th 1 y ) (1 (z h 1 y ) )221 (z Th 1 z ) Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 2: Alternating Least SquaresProof sketch for rank-1 ALSminy Rd1 ,z Rd2XT2(aTi y z bj Tij )(i,j) Ω(e) Given C1 and C2’, the first iterate y 0 is well initialized, i.e. y T0 y 0.9,which guarantees the initial noise is small enough(f) The iterates can then be shown to linearly converge to the optimal:12(1 (y Th z ) )21T221 (y Th 1 y ) (1 (z h 1 y ) )221 (z Th 1 z ) Similarly, the rank-k case can be proved.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Inductive Matrix Completion: Sample ComplexitySample complexity of Inductive Matrix Completion (IMC) and MatrixCompletion (MC).methodsNuclear-normALSIMCO(kd log n log d)O(k 4 β 2 d log d)MCkn log n (Recht, 2011)k 3 β 2 n log n (Hardt, 2014)2where β is the condition number of XIn most cases, n dIncoherence conditions on A, B are requiredSatisfied e.g. when A, B are Gaussian (no assumption on X needed)B. Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research 12 : 3413-3430 (2011).M. Hardt. Understanding alternating minimization for matrix completion. Foundations of Computer Science (FOCS), IEEE 55thAnnual Symposium, pp. 651-660 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Inductive Matrix Completion: Sample Complexity ResultsAll matrices are sampled from Gaussian random distribution.Left two figures: fix k 5, n 1000 and change d.Right two figures: fix k 5, d 50 and change n.Darkness of the shading is proportional to the number of failures(repeated 10 times).20000.615000.410000.2500050100d150 Ω vs. d 25007000.66000.45004000.2300050100d1500 Ω vs. d (Nuclear)200019000.8800Number of MeasurementsNumber of Measurements0.8Number of Measurements12500Number of 00n60080010000 Ω vs. n (ALS)2000200400n6008001000 Ω vs. n (Nuclear)Sample complexity is proportional to d while almost independent of nfor both Nuclear-norm and ALS methods.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction0

Positive-Unlabeled LearningInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Modern Prediction Problems in Machine LearningPredicting causal disease genesInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Bilinear Prediction: PU LearningIn many applications, only “positive” labels are observedInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU LearningNo observations of the “negative” class availableInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix CompletionGuarantees so far assume observations are sampled uniformlyWhat can we say about the case when observations are all 1’s(“positives”)?Typically, 99% entries are missing (“unlabeled”)Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix CompletionInductive Matrix Completion:minX :kX k tX2(aTi X bj Tij )(i,j) ΩCommonly used PU strategy: Biased Matrix CompletionXX22min α(aT(aTi X bj Tij ) (1 α)i X bj 0)X :kX k t(i,j) Ω(i,j)6 ΩTypically, α 1 α (α 0.9).V. Sindhwani, S. S. Bucak, J. Hu, A. Mojsilovic. One-class matrix completion with low-density factorizations. ICDM, pp.1055-1060. 2010.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix CompletionInductive Matrix Completion:minX :kX k tX2(aTi X bj Tij )(i,j) ΩCommonly used PU strategy: Biased Matrix CompletionXX22min α(aT(aTi X bj Tij ) (1 α)i X bj 0)X :kX k t(i,j) Ω(i,j)6 ΩTypically, α 1 α (α 0.9).We can show guarantees for the biased formulationV. Sindhwani, S. S. Bucak, J. Hu, A. Mojsilovic. One-class matrix completion with low-density factorizations. ICDM, pp.1055-1060. 2010.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Learning: Random Noise ModelCan be formulated as learning with “class-conditional” noiseN. Natarajan, I. S. Dhillon, P. Ravikumar, and A.Tewari. Learning with Noisy Labels. In Advances in Neural InformationProcessing Systems, pp. 1196-1204. 2013.Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix CompletionA deterministic PU learning model(1Tij 0Inderjit S. Dhillon Dept of Computer Science UT Austinif Mij 0.5,if Mij 0.5Low-Rank Bilinear Prediction

PU Inductive Matrix CompletionA deterministic PU learning modelP(T̃ij 0 Tij 1) ρ and P(T̃ij 0 Tij 0) 1.We are given only T̃ but not T or MGoal: Recover T given T̃ (recovering M is not possible!)Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 1: Biased Inductive Matrix CompletionXb minX :kX k tαX2(aTi X bj 1) (1 α)X2(aTi X bj 0)(i,j)6 Ω(i,j) ΩRationale: (a) Fix α (1 ρ)/2 and define Tbij I (AXb B T )ij 0.5(b) The above problem is equivalent to:XXb min α ((AXB T )ij , T̃ij )X :kX k ti,jwhere α (x, T̃ij ) αT̃ij (x 1)2 (1 α)(1 T̃ij )x 2(c) Minimizing α loss is equivalent to minimizing the true error, i.e.1 X1 b α ((AXB T )ij , T̃ij ) C1kT T k2F C2mnmnijInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Algorithm 1: Biased Inductive Matrix CompletionTheorem (Error Bound for Biased IMC)Assume ground-truth X satisfies kX k t (where M AXB T ). DefineTbij I (AXb B T )ij 0.5 , A maxi kai k and B maxi kbi k. If α 1 ρ2 ,then with probability at least 1 δ, pη log(2/δ) η tAB log 2d12b kT T kF On2n(1 ρ)(1 ρ)n3/2where η 4(1 2ρ).C-J. Hsieh, N. Natarajan, and I. S. Dhillon. PU Learning for Matrix Completion. In Proceedings of The 32nd InternationalConference on Machine Learning, pp. 2445-2453 (2015).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Experimental ResultsInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction: Image Tag RecommendationNUS-Wide Image Dataset161,780 training images107,879 test images1,134 features1,000 tagsInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction: Image Tag RecommendationH. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31stInternational Conference on Machine Learning, pp. 593-601 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction: Image Tag RecommendationLow-rank Model with k [email protected] Model with k [email protected] F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31stInternational Conference on Machine Learning, pp. 593-601 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction: Wikipedia Tag RecommendationWikipedia Dataset881,805 training wiki pages10,000 test wiki pages366,932 features213,707 tagsInderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

Multi-target Prediction: Wikipedia Tag RecommendationLow-rank Model with k [email protected] Model with k 500:time(s)LEML(ALS) 6AUC0.93740.9058H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31stInternational Conference on Machine Learning, pp. 593-601 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix Completion: Gene-Disease PredictionN. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),i60-i68 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

PU Inductive Matrix Completion: Gene-Disease Prediction2.5CatapultKatzIMC2LEMLMatrix CompletionPrecision(%)P(hidden gene among genes looked at)Matrix Completion on C0.20.11.510.500204060Number of genes looked at801000051015Recall(%)2025Predicting gene-disease associations in the OMIM data set(www.omim.org).N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),i60-i68 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction30

PU Inductive Matrix Completion: Gene-Disease PredictionCatapultMatrix Completion on CP(hidden gene among genes looked at)P(hidden gene among genes looked at)KatzIMC0.100LEML204060Number of genes looked at801000.20.100204060Number of genes looked at80Predicting genes for diseases with no training associations.N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),i60-i68 (2014).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction100

Conclusions and Future WorkInductive Matrix Completion:Scales to millions of targetsCaptures correlations among targetsOvercomes missing valuesExtension to PU learningMuch work to do:Other structures: low-rank sparse, low-rank column-sparse (outliers)?Different loss functions?Handling “time” as one of the dimensions — incorporating smoothnessthrough graph regularization?Incorporating non-linearities?Efficient (parallel) implementations?Improved recovery guarantees?Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

References[1] P. Jain, and I. S. Dhillon. Provable inductive matrix completion. arXiv preprintarXiv:1306.0626 (2013).[2] K. Zhong, P. Jain, I. S. Dhillon. Efficient Matrix Sensing Using Rank-1 GaussianMeasurements. In Proceedings of The 26th Conference on Algorithmic Learning Theory(2015).[3] N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting genedisease associations. Bioinformatics, 30(12), i60-i68 (2014).[4] H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning withMissing Labels. In Proceedings of The 31st International Conference on MachineLearning, pp. 593-601 (2014).[5] C-J. Hsieh, N. Natarajan, and I. S. Dhillon. PU Learning for Matrix Completion. InProceedings of The 32nd International Conference on Machine Learning, pp. 2445-2453(2015).Inderjit S. Dhillon Dept of Computer Science UT AustinLow-Rank Bilinear Prediction

car insurance geico insurance need cheap auto insurance geico com car insurance coupon code Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction. Modern Prediction Problems in Machine Learning Wikipedia Tag Recommendation Learning in computer vision

Related Documents: