A Textbook Linear Algebra And Optimization For Machine Learning

1y ago

8 Views

2 Downloads

1.91 MB

19 Pages

Last View : 19d ago

Last Download : 3m ago

Upload by : Ronnie Bonney

Report this link

Download PDF

Transcription

Linear Algebra and Optimization for Machine LearningA TextbookAggarwalCharu C. AggarwalA frequent challenge faced by beginners in machine learning is the extensive backgroundrequirement in linear algebra and optimization. This makes the learning curve very steep.This book, therefore, reverses the focus by teaching linear algebra and optimization as theprimary topics of interest, and solutions to machine learning problems as applications ofthese methods. Therefore, the book also provides significant exposure to machinelearning. The chapters of this book belong to two categories: . Linear algebra and its applications: These chapters focus on the basics of linear algebratogether with their common applications to singular value decomposition, similaritymatrices (kernel methods), and graph analysis. Numerous machine learning applicationshave been used as examples, such as spectral clustering, kernel-based classification,and outlier detection. . Optimization and its applications: Basic methods in optimization such as gradient descent,Newton’s method, and coordinate descent are discussed. Constrained optimizationmethods are introduced as well. Machine learning applications such as linear regression,SVMs, logistic regression, matrix factorization, recommender systems, and K-meansclustering are discussed in detail. A general view of optimization in computationalgraphs is discussed together with its applications to backpropagation in neural networks.Exercises are included both within the text of the chapters and at the end of the chapters.The book is written for a diverse audience, including graduate students, researchers, andpractitioners.Watson Research Center in Yorktown Heights, New York. He completed his undergraduate degree in Computer Science from the IndianInstitute of Technology at Kanpur in 1993 and his Ph.D. in OperationsResearch from the Massachusetts Institute of Technology in 1996.He has published more than 400 papers in refereed conferences andjournals, and has applied for or been granted more than 80 patents.He is author or editor of 19 books, including textbooks on data mining, neural networks,machine learning (for text), recommender systems, and outlier analysis. Because ofthe commercial value of his patents, he has thrice been designated a Master Inventorat IBM. He has received several internal and external awards, including the EDBTTest-of-Time Award (2014), the IEEE ICDM Research Contributions Award (2015),and the ACM SIGKDD Innovation Award (2019). He has served as editor-in-chief ofthe ACM SIGKDD Explorations, and is currently serving as an editor-in-chief of theACM Transactions on Knowledge Discovery from Data. He is also an editor-in-chiefof ACM Books. He is a fellow of the SIAM, ACM, and the IEEE, for “contributions toknowledge discovery and data mining algorithms.”ISBN 978-3-030-40343-09783030 403430Linear Algebra and Optimization for Machine LearningAbout the AuthorCharu C. Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T. J.1Charu C. AggarwalLinear Algebraand Optimizationfor MachineLearningA Textbook

Linear Algebra and Optimization for Machine Learning

Charu C. AggarwalLinear Algebra and Optimizationfor Machine LearningA Textbook

Charu C. AggarwalDistinguished Research Staff MemberIBM T.J. Watson Research CenterYorktown Heights, NY, USAISBN 978-3-030-40343-0ISBN 978-3-030-40344-7 (eBook)https://doi.org/10.1007/978-3-030-40344-7 Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction onmicrofilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations andtherefore free for general use.The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to betrue and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed orimplied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisherremains neutral with regard to jurisdictional claims in published maps and institutional affiliations.This Springer imprint is published by the registered company Springer Nature Switzerland AG.The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my wife Lata, my daughter Sayani,and all my mathematics teachers

Contents1 Linear Algebra and Optimization: An Introduction1.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2Scalars, Vectors, and Matrices . . . . . . . . . . . . . . . . . . . . . .1.2.1 Basic Operations with Scalars and Vectors . . . . . . . . . . .1.2.2 Basic Operations with Vectors and Matrices . . . . . . . . . .1.2.3 Special Classes of Matrices . . . . . . . . . . . . . . . . . . . .1.2.4 Matrix Powers, Polynomials, and the Inverse . . . . . . . . .1.2.5 The Matrix Inversion Lemma: Inverting the Sum of Matrices1.2.6 Frobenius Norm, Trace, and Energy . . . . . . . . . . . . . .1.3Matrix Multiplication as a Decomposable Operator . . . . . . . . . .1.3.1 Matrix Multiplication as Decomposable Row and ColumnOperators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3.2 Matrix Multiplication as Decomposable Geometric Operators1.4Basic Problems in Machine Learning . . . . . . . . . . . . . . . . . .1.4.1 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . .1.4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4.3 Classiﬁcation and Regression Modeling . . . . . . . . . . . . .1.4.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . .1.5Optimization for Machine Learning . . . . . . . . . . . . . . . . . . .1.5.1 The Taylor Expansion for Function Simpliﬁcation . . . . . . .1.5.2 Example of Optimization in Machine Learning . . . . . . . .1.5.3 Optimization in Computational Graphs . . . . . . . . . . . .1.6Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.7Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.8Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112381214171921.21252727282930313133343535362 Linear Transformations and Linear Systems2.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1.1 What Is a Linear Transform? . . . . . . . . . . . . . . . . . . . . .2.2The Geometry of Matrix Multiplication . . . . . . . . . . . . . . . . . . . .41414243VII

.14Vector Spaces and Their Geometry . . . . . . . . . . . . . . . . .2.3.1 Coordinates in a Basis System . . . . . . . . . . . . . . . .2.3.2 Coordinate Transformations Between Basis Sets . . . . . .2.3.3 Span of a Set of Vectors . . . . . . . . . . . . . . . . . . .2.3.4 Machine Learning Example: Discrete Wavelet Transform .2.3.5 Relationships Among Subspaces of a Vector Space . . . .The Linear Algebra of Matrix Rows and Columns . . . . . . . . .The Row Echelon Form of a Matrix . . . . . . . . . . . . . . . . .2.5.1 LU Decomposition . . . . . . . . . . . . . . . . . . . . . .2.5.2 Application: Finding a Basis Set . . . . . . . . . . . . . .2.5.3 Application: Matrix Inversion . . . . . . . . . . . . . . . .2.5.4 Application: Solving a System of Linear Equations . . . .The Notion of Matrix Rank . . . . . . . . . . . . . . . . . . . . .2.6.1 Eﬀect of Matrix Operations on Rank . . . . . . . . . . . .Generating Orthogonal Basis Sets . . . . . . . . . . . . . . . . . .2.7.1 Gram-Schmidt Orthogonalization and QR Decomposition2.7.2 QR Decomposition . . . . . . . . . . . . . . . . . . . . . .2.7.3 The Discrete Cosine Transform . . . . . . . . . . . . . . .An Optimization-Centric View of Linear Systems . . . . . . . . .2.8.1 Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . .2.8.2 The Projection Matrix . . . . . . . . . . . . . . . . . . . .Ill-Conditioned Matrices and Systems . . . . . . . . . . . . . . . .Inner Products: A Geometric View . . . . . . . . . . . . . . . . .Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . .2.11.1 The Discrete Fourier Transform . . . . . . . . . . . . . . .Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Eigenvectors and Diagonalizable Matrices3.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Diagonalizable Transformations and Eigenvectors . . . . . . . . . .3.3.1 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . .3.3.2 Left Eigenvectors and Right Eigenvectors . . . . . . . . . .3.3.3 Existence and Uniqueness of Diagonalization . . . . . . . .3.3.4 Existence and Uniqueness of Triangulization . . . . . . . . .3.3.5 Similar Matrix Families Sharing Eigenvalues . . . . . . . . .3.3.6 Diagonalizable Matrix Families Sharing Eigenvectors . . . .3.3.7 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . .3.3.8 Positive Semideﬁnite Matrices . . . . . . . . . . . . . . . . .3.3.9 Cholesky Factorization: Symmetric LU Decomposition . . .3.4Machine Learning and Optimization Applications . . . . . . . . . .3.4.1 Fast Matrix Operations in Machine Learning . . . . . . . .3.4.2 Examples of Diagonalizable Matrices in Machine Learning .3.4.3 Symmetric Matrices in Quadratic Optimization . . . . . . .3.4.4 Diagonalization Application: Variable Separationfor Optimization . . . . . . . . . . . . . . . . . . . . . . . .3.4.5 Eigenvectors in Norm-Constrained Quadratic 17119120121121124. . . . . . .128130

CONTENTS3.5.1311321331351351354 Optimization Basics: A Machine Learning View4.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2The Basics of Optimization . . . . . . . . . . . . . . . . . . . . . . .4.2.1 Univariate Optimization . . . . . . . . . . . . . . . . . . . . .4.2.1.1 Why We Need Gradient Descent . . . . . . . . . . .4.2.1.2 Convergence of Gradient Descent . . . . . . . . . . .4.2.1.3 The Divergence Problem . . . . . . . . . . . . . . . .4.2.2 Bivariate Optimization . . . . . . . . . . . . . . . . . . . . . .4.2.3 Multivariate Optimization . . . . . . . . . . . . . . . . . . . .4.3Convex Objective Functions . . . . . . . . . . . . . . . . . . . . . . .4.4The Minutiae of Gradient Descent . . . . . . . . . . . . . . . . . . . .4.4.1 Checking Gradient Correctness with Finite Diﬀerences . . . .4.4.2 Learning Rate Decay and Bold Driver . . . . . . . . . . . . .4.4.3 Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4.3.1 Binary Search . . . . . . . . . . . . . . . . . . . . . .4.4.3.2 Golden-Section Search . . . . . . . . . . . . . . . . .4.4.3.3 Armijo Rule . . . . . . . . . . . . . . . . . . . . . . .4.4.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5Properties of Optimization in Machine Learning . . . . . . . . . . . .4.5.1 Typical Objective Functions and Additive Separability . . . .4.5.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . .4.5.3 How Optimization in Machine Learning Is Diﬀerent . . . . . .4.5.4 Tuning Hyperparameters . . . . . . . . . . . . . . . . . . . . .4.5.5 The Importance of Feature Preprocessing . . . . . . . . . . .4.6Computing Derivatives with Respect to Vectors . . . . . . . . . . . .4.6.1 Matrix Calculus Notation . . . . . . . . . . . . . . . . . . . .4.6.2 Useful Matrix Calculus Identities . . . . . . . . . . . . . . . .4.6.2.1 Application: Unconstrained Quadratic Programming4.6.2.2 Application: Derivative of Squared Norm . . . . . .4.6.3 The Chain Rule of Calculus for Vectored Derivatives . . . . .4.6.3.1 Useful Examples of Vectored Derivatives . . . . . . .4.7Linear Regression: Optimization with Numerical Targets . . . . . . .4.7.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . .4.7.1.1 Pseudoinverse and Connections to Regularization . .4.7.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . .4.7.3 The Use of Bias . . . . . . . . . . . . . . . . . . . . . . . . . .4.7.3.1 Heuristic Initialization . . . . . . . . . . . . . . . . .4.8Optimization Models for Binary Targets . . . . . . . . . . . . . . . .4.8.1 Least-Squares Classiﬁcation: Regression on Binary Targets . .4.8.1.1 Why Least-Squares Classiﬁcation Loss Needs 1761781791791791801801811833.63.73.8Numerical Algorithms for Finding Eigenvectors . . . . . . . .3.5.1 The QR Method via Schur Decomposition . . . . . . .3.5.2 The Power Method for Finding Dominant EigenvectorsSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IX.

931941941961971971981991995 Advanced Optimization Solutions5.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2Challenges in Gradient-Based Optimization . . . . . . . . . . . . . .5.2.1 Local Optima and Flat Regions . . . . . . . . . . . . . . . . .5.2.2 Diﬀerential Curvature . . . . . . . . . . . . . . . . . . . . . .5.2.2.1 Revisiting Feature Normalization . . . . . . . . . . .5.2.3 Examples of Diﬃcult Topologies: Cliﬀs and Valleys . . . . . .5.3Adjusting First-Order Derivatives for Descent . . . . . . . . . . . . .5.3.1 Momentum-Based Learning . . . . . . . . . . . . . . . . . . .5.3.2 AdaGrad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.3.3 RMSProp . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.3.4 Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4The Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4.1 The Basic Form of the Newton Method . . . . . . . . . . . . .5.4.2 Importance of Line Search for Non-quadratic Functions . . .5.4.3 Example: Newton Method in the Quadratic Bowl . . . . . . .5.4.4 Example: Newton Method in a Non-quadratic Function . . .5.5Newton Methods in Machine Learning . . . . . . . . . . . . . . . . .5.5.1 Newton Method for Linear Regression . . . . . . . . . . . . .5.5.2 Newton Method for Support-Vector Machines . . . . . . . . .5.5.3 Newton Method for Logistic Regression . . . . . . . . . . . .5.5.4 Connections Among Diﬀerent Models and Uniﬁed Framework5.6Newton Method: Challenges and Solutions . . . . . . . . . . . . . . .5.6.1 Singular and Indeﬁnite Hessian . . . . . . . . . . . . . . . . .5.6.2 The Saddle-Point Problem . . . . . . . . . . . . . . . . . . . 202212212232252282292292294.94.104.114.124.13The Support Vector Machine . . . . . . . . . . . . . . .4.8.2.1 Computing Gradients . . . . . . . . . . . . . .4.8.2.2 Stochastic Gradient Descent . . . . . . . . . . .4.8.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . .4.8.3.1 Computing Gradients . . . . . . . . . . . . . .4.8.3.2 Stochastic Gradient Descent . . . . . . . . . . .4.8.4 How Linear Regression Is a Parent Problem in MachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . . .Optimization Models for the MultiClass Setting . . . . . . . . .4.9.1 Weston-Watkins Support Vector Machine . . . . . . . .4.9.1.1 Computing Gradients . . . . . . . . . . . . . .4.9.2 Multinomial Logistic Regression . . . . . . . . . . . . . .4.9.2.1 Computing Gradients . . . . . . . . . . . . . .4.9.2.2 Stochastic Gradient Descent . . . . . . . . . . .Coordinate Descent . . . . . . . . . . . . . . . . . . . . . . . . .4.10.1 Linear Regression with Coordinate Descent . . . . . . .4.10.2 Block Coordinate Descent . . . . . . . . . . . . . . . . .4.10.3 K-Means as Block Coordinate Descent . . . . . . . . . .Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS5.6.35.75.85.95.105.11Convergence Problems and Solutions with Non-quadraticFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.6.3.1 Trust Region Method . . . . . . . . . . . . . . . .Computationally Eﬃcient Variations of Newton Method . . . . . .5.7.1 Conjugate Gradient Method . . . . . . . . . . . . . . . . . .5.7.2 Quasi-Newton Methods and BFGS . . . . . . . . . . . . . .Non-diﬀerentiable Optimization Functions . . . . . . . . . . . . . .5.8.1 The Subgradient Method . . . . . . . . . . . . . . . . . . . .5.8.1.1 Application: L1 -Regularization . . . . . . . . . . .5.8.1.2 Combining Subgradients with Coordinate Descent5.8.2 Proximal Gradient Method . . . . . . . . . . . . . . . . . .5.8.2.1 Application: Alternative for L1 -RegularizedRegression . . . . . . . . . . . . . . . . . . . . . .5.8.3 Designing Surrogate Loss Functions for CombinatorialOptimization . . . . . . . . . . . . . . . . . . . . . . . . . .5.8.3.1 Application: Ranking Support Vector Machine . .5.8.4 Dynamic Programming for Optimizing Sequential Decisions5.8.4.1 Application: Fast Matrix Multiplication . . . . . .Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XI.231232233233237239240242243244. . . .245.246247248249250250251.6 Constrained Optimization and Duality2556.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2556.2Primal Gradient Descent Methods . . . . . . . . . . . . . . . . . . . . . . . 2566.2.1 Linear Equality Constraints . . . . . . . . . . . . . . . . . . . . . . 2576.2.1.1 Convex Quadratic Program with Equality Constraints . . 2596.2.1.2 Application: Linear Regression with EqualityConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . 2616.2.1.3 Application: Newton Method with EqualityConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . 2626.2.2 Linear Inequality Constraints . . . . . . . . . . . . . . . . . . . . . 2626.2.2.1 The Special Case of Box Constraints . . . . . . . . . . . . 2636.2.2.2 General Conditions for Projected Gradient Descentto Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2646.2.2.3 Sequential Linear Programming . . . . . . . . . . . . . . . 2666.2.3 Sequential Quadratic Programming . . . . . . . . . . . . . . . . . . 2676.3Primal Coordinate Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 2676.3.1 Coordinate Descent for Convex Optimization Over Convex Set . . 2686.3.2 Machine Learning Application: Box Regression . . . . . . . . . . . 2696.4Lagrangian Relaxation and Duality . . . . . . . . . . . . . . . . . . . . . . 2706.4.1 Kuhn-Tucker Optimality Conditions . . . . . . . . . . . . . . . . . 2746.4.2 General Procedure for Using Duality . . . . . . . . . . . . . . . . . 2766.4.2.1 Inferring the Optimal Primal Solution from Optimal DualSolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2766.4.3 Application: Formulating the SVM Dual . . . . . . . . . . . . . . . 2766.4.3.1 Inferring the Optimal Primal Solution from Optimal DualSolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

CONTENTSXII6.4.46.56.66.76.86.96.10Optimization Algorithms for the SVM Dual . . . . . . . . . . . . .6.4.4.1 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . .6.4.4.2 Coordinate Descent . . . . . . . . . . . . . . . . . . . . .6.4.5 Getting the Lagrangian Relaxation of Unconstrained Problems . .6.4.5.1 Machine Learning Application: Dual of Linear RegressionPenalty-Based and Primal-Dual Methods . . . . . . . . . . . . . . . . . . .6.5.1 Penalty Method with Single Constraint . . . . . . . . . . . . . . . .6.5.2 Penalty Method: General Formulation . . . . . . . . . . . . . . . .6.5.3 Barrier and Interior Point Methods . . . . . . . . . . . . . . . . . .Norm-Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . .Primal Versus Dual Methods . . . . . . . . . . . . . . . . . . . . . . . . . .Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Singular Value Decomposition7.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2SVD: A Linear Algebra Perspective . . . . . . . . . . . . . . . . . . . . .7.2.1 Singular Value Decomposition of a Square Matrix . . . . . . . . .7.2.2 Square SVD to Rectangular SVD via Padding . . . . . . . . . . .7.2.3 Several Deﬁnitions of Rectangular Singular Value Decomposition7.2.4 Truncated Singular Value Decomposition . . . . . . . . . . . . . .7.2.4.1 Relating Truncation Loss to Singular Values . . . . . .7.2.4.2 Geometry of Rank-k Truncation . . . . . . . . . . . . .7.2.4.3 Example of Truncated SVD . . . . . . . . . . . . . . . .7.2.5 Two Interpretations of SVD . . . . . . . . . . . . . . . . . . . . .7.2.6 Is Singular Value Decomposition Unique? . . . . . . . . . . . . .7.2.7 Two-Way Versus Three-Way Decompositions . . . . . . . . . . .7.3SVD: An Optimization Perspective . . . . . . . . . . . . . . . . . . . . .7.3.1 A Maximization Formulation with Basis Orthogonality . . . . . .7.3.2 A Minimization Formulation with Residuals . . . . . . . . . . . .7.3.3 Generalization to Matrix Factorization Methods . . . . . . . . . .7.3.4 Principal Component Analysis . . . . . . . . . . . . . . . . . . . .7.4Applications of Singular Value Decomposition . . . . . . . . . . . . . . .7.4.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . .7.4.2 Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4.3 Finding the Four Fundamental Subspaces in Linear Algebra . . .7.4.4 Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . . . . . .7.4.4.1 Ill-Conditioned Square Matrices . . . . . . . . . . . . .7.4.5 Solving Linear Equations and Linear Regression . . . . . . . . . .7.4.6 Feature Preprocessing and Whitening in Machine Learning . . . .7.4.7 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4.8 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . .7.5Numerical Algorithms for SVD . . . . . . . . . . . . . . . . . . . . . . .7.6Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323324325325326327327328329330332332333

CONTENTSXIII8 Matrix Factorization8.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2Optimization-Based Matrix Factorization . . . . . . . . . . . . . . . . .8.2.1 Example: K-Means as Constrained Matrix Factorization . . . .8.3Unconstrained Matrix Factorization . . . . . . . . . . . . . . . . . . . .8.3.1 Gradient Descent with Fully Speciﬁed Matrices . . . . . . . . .8.3.2 Application to Recommender Systems . . . . . . . . . . . . . .8.3.2.1 Stochastic Gradient Descent . . . . . . . . . . . . . . .8.3.2.2 Coordinate Descent . . . . . . . . . . . . . . . . . . .8.3.2.3 Block Coordinate Descent: Alternating Least Squares8.4Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . .8.4.1 Optimization Problem with Frobenius Norm . . . . . . . . . . .8.4.1.1 Projected Gradient Descent with Box Constraints . .8.4.2 Solution Using Duality . . . . . . . . . . . . . . . . . . . . . . .8.4.3 Interpretability of Nonnegative Matrix Factorization . . . . . .8.4.4 Example of Nonnegative Matrix Factorization . . . . . . . . . .8.4.5 The I-Divergence Objective Function . . . . . . . . . . . . . . .8.5Weighted Matrix Factorization . . . . . . . . . . . . . . . . . . . . . .8.5.1 Practical Use Cases of Nonnegative and Sparse Matrices . . . .8.5.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . .8.5.2.1 Why Negative Sampling Is Important . . . . . . . . .8.5.3 Application: Recommendations with Implicit Feedback Data . .8.5.4 Application: Link Prediction in Adjacency Matrices . . . . . . .8.5.5 Application: Word-Word Context Embedding with GloVe . . .8.6Nonlinear Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . .8.6.1 Logistic Matrix Factorization . . . . . . . . . . . . . . . . . . .8.6.1.1 Gradient Descent Steps for Logistic MatrixFactorization . . . . . . . . . . . . . . . . . . . . . . .8.6.2 Maximum Margin Matrix Factorization . . . . . . . . . . . . .8.7Generalized Low-Rank Models . . . . . . . . . . . . . . . . . . . . . . .8.7.1 Handling Categorical Entries . . . . . . . . . . . . . . . . . . .8.7.2 Handling Ordinal Entries . . . . . . . . . . . . . . . . . . . . .8.8Shared Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . .8.8.1 Gradient Descent Steps for Shared Factorization . . . . . . . .8.8.2 How to Set Up Shared Models in Arbitrary Scenarios . . . . . .8.9Factorization Machines . . . . . . . . . . . . . . . . . . . . . . . . . . .8.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.11 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 The Linear Algebra of Similarity9.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2Equivalence of Data and Similarity Matrices . . . . . . . . . . . . .9.2.1 From Data Matrix to Similarity Matrix and Back . . . . . .9.2.2 When Is Data Recovery from a Similarity Matrix Useful? .9.2.3 What Types of Similarity Matrices Are “Valid”? . . . . . .9.2.4 Symmetric Matrix Factorization as an Optimization Model9.2.5 Kernel Methods: The Machine Learning Terminology . . . 0370371375375375.379379379380381382383383

CONTENTSXIV9.39.49.59.69.79.89.99.1010 The10.110.210.310.410.5Eﬃcient Data Recovery from Similarity Matrices . . . . . . . . .9.3.1 Nyström Sampling . . . . . . . . . . . . . . . . . . . . . .9.3.2 Matrix Factorization with Stochastic Gradient Descent . .9.3.3 Asymmetric Similarity Decompositions . . . . . . . . . . .Linear Algebra Operations on Similarity Matrices . . . . . . . . .9.4.1 Energy of Similarity Matrix and Unit Ball Normalization9.4.2 Norm of the Mean and Variance . . . . . . . . . . . . . . .9.4.3 Centering a Similarity Matrix . . . . . . . . . . . . . . . .9.4.3.1 Application: Kernel PCA . . . . . . . . . . . . .9.4.4 From Similarity Matrix to Distance Matrix and Back . . .9.4.4.1 Application: ISOMAP . . . . . . . . . . . . . . .Machine Learning with Similarity Matrices . . . . . . . . . . . . .9.5.1 Feature Engineering from Similarity Matrix . . . . . . . .9.5.1.1 Kernel Clustering . . . . . . . . . . . . . . . . . .9.5.1.2 Kernel Outlier Detection . . . . . . . . . . . . .9.5.1.3 Kernel Classiﬁcation . . . . . . . . . . . . . . . .9.5.2 Direct Use of Similarity Matrix . . . . . . . . . . . . . . .9.5.2.1 Kernel K-Means . . . . . . . . . . . . . . . . . .9.5.2.2 Kernel SVM . . . . . . . . . . . . . . . . . . . .The Linear Algebra of the Representer Theorem . . . . . . . . . .Similarity Matrices and Linear Separability . . . . . . . . . . . .9.7.1 Transformations That Preserve Positive Semi-deﬁnitenessSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Linear Algebra of GraphsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Graph Basics and Adjacency Matrices . . . . . . . . . . . . . . . . . .Powers of Adjacency Matrices . . . . . . . . . . . . . . . . . . . . . . .The Perron-Frobenius Theorem . . . . . . . . . . . . . . . . . . . . . .The Right Eigenvectors of Graph Matrices . . . . . . . . . . . . . . . .10.5.1 The Kernel View of Spectral Clustering . . . . . . . . . . . . .10.5.1.1 Relating Shi-Malik and Ng-Jordan-Weiss Embeddings10.5.2 The Laplacian View of Spectral Clustering . . . . . . . . . . . .10.5.2.1 Graph Laplaci

Linear Algebra and Optimization for Machine Learning A Textbook A frequent challenge faced by beginners in machine learning is the extensive background requirement in linear algebra and optimization. This makes the learning curve very steep. This book, therefore, reverses the focus by teaching linear algebra and optimization as the

Related Documents:

Financial Algebra - Cengage

Robert Gerver, Ph.D. North Shore High School 450 Glen Cove Avenue Glen Head, NY 11545 gerverr@northshoreschools.org Rob has been teaching at . Algebra 1 Financial Algebra Geometry Algebra 2 Algebra 1 Geometry Financial Algebra Algebra 2 Algebra 1 Geometry Algebra 2 Financial Algebra ! Concurrently with Geometry, Algebra 2, or Precalculus

178 Views

3y ago

Textbook 2: Applied Linear Algebra

MATH 401: APPLICATIONS OF LINEAR ALGEBRA Section: 0301 Lectures: TuTh 9:30am { 10:45am, MTH B0423 O ce hours: Tu 11:00 { 11:59am or by appointment Textbook 1: Linear Algebra and its applications (4th edition), by David C. Lay, ISBN: 9780321385178 Textbook 2: Applied Linear Algebra (1st edition), by Peter J. Olver and Chehrzad Shakiban, ISBN .

30 Views

2y ago

Introduction to Linear Bialgebra

INTRODUCTION TO LINEAR ALGEBRA AND S-LINEAR ALGEBRA 1.1 Basic properties of linear algebra 7 1.2 Introduction to s-linear algebra 15 1.3 Some aapplications of S-linear algebra 30 Chapter Two INTRODUCTORY COCEPTS OF BASIC BISTRUCTURES AND S-BISTRUCTU

128 Views

2y ago

Basics of Linear Algebra for Machine Learning Discover the Mathematical ...

results- rst approach to machine learning, and linear algebra is not the rst step, but perhaps the second or third. Practitioners Study Too Much Linear Algebra When practitioners do circle back to study linear algebra, they learn far more of the eld than is required for or relevant to machine learning. Linear algebra is a large eld of study

13 Views

1y ago

Algebra Review De ning Review of Linear Algebra for Statistics

Sep 07, 2020 · 06 - Linear Algebra Review De ning Matrices Basic Matrix Operations Special Types of Matrices Matrix Inversion Properties of Matrices Operations of Matrices Simple Linear Regression References OverviewI We wrap up the math topics by reviewing some linear algebra concepts Linear algebra

90 Views

2y ago

MTH 210: Intro to Linear Algebra Fall 2019 Course Notes ...

MTH 210: Intro to Linear Algebra Fall 2019 Course Notes Drew Armstrong Linear algebra is the common denominator of modern mathematics. From the most pure to the most applied, if you use mathematics then you will use linear algebra. It is also a relatively new subject. Linear algebra as we

127 Views

2y ago

Math 67: Modern Linear Algebra (UC Davis, Winter 2020 ...

High-level description of course goals: 1. linear algebra theory; 2. linear algebra computa-tional skills; 3. introduction to abstract math. Today’s topic: introduction to linear algebra. Conceptually, linear algebra is about sets of quantities (a.k.a. vectors

27 Views

2y ago

Animal Science - University of Nottingham

Principles of Animal Nutrition Applied Animal Science Research Techniques for Bioscientists Principles of Animal Health and Disease 1 Optional Physiology of Electrically Excitable Tissues Animal Behaviour Applied Agricultural and Food Marketing Economic Analysis for Agricultural and Environmental Sciences Physiology and Biotechnology option Core Endocrine Control Systems Reproductive .

64 Views

3y ago

Recent Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Dangerous Defendants - Yale Law Journal

Law School, Louisiana State University Paul M. Hebert Law Center, Roger Williams University School of Law, Rutgers Law School, Sandra Day O'Connor College of Law, Southern Methodist University Dedman School of Law, University of Georgia School of Law, and University of Utah S.J. Quinney College of Law. For institutional support, I am grateful .

1y ago

169 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

APPLYING TO LAW SCHOOL - University of Pennsylvania

You will apply to law school through the Law School Admission Council (LSAC). 1 6 4 5 3 2 Individual Law School Application Personal Statement Law School Resume 1-3 Letters of Recommendation Dean’s Letter/Certification LSAC Law School Report with official academic transcript(s) and LSAT score(s)

2y ago

160 Views

OF THE LAW LIBRARY - University at Buffalo Libraries

the Law School. 1910 Bang's Law Library is sold, and a fund is established to develop a Law School Library (with many notable donors); students pay an extra 10 library fee. 1936-37 Law Library adds 6,300 books, allowing the Law School to become accredited by the American Bar Association. Law School moves to the new Ellicott Square Building in

1y ago

88 Views

CRIMINAL LAW: CASES, MATERIALS, AND LAWYERING

UTK Distinguished Professor of Law, University of Tennessee College of Law; John T. Parry, professor of law, Lewis & Clark Law School; Penelope Pether, professor of law, Villanova University School of Law. --Third edition. pages cm Includes index. ISBN 978-0-7698-8270-3 1. Criminal law--Unit

2y ago

189 Views

A Trail Guide to Careers in Environmental Law

law, constitutional law, property law, bankruptcy law, criminal law, food and drug law, land use planning law, and international law. A distinctive aspect of environmental practice is the role of science in advocacy efforts.

3y ago

241 Views

Accounting Technicians Diploma (ATD) Examination Syllabus

Apply law of contract and tort in various scenarios Apply general principles of business law in practice. CONTENT 2.1 Elements of the legal system 2.1.1 Nature, purpose and classification of law - Meaning of law - Nature of law - Purpose of law - Classification of law - Law and morality 2.1.2 Sources of law - The Constitution

3y ago

216 Views

PRINCIPLES OF BUSINESS LAW - DPHU

ABE Diploma in Business Administration Study Manual PRINCIPLES OF BUSINESS LAW Contents Study Unit Title Page Syllabus i 1 Nature and Sources of Law 1 Nature of Law 3 Historical Origins 6 Sources of Law 9 The European Community and UK Law: An Overview 13 2 Common Law, Equity and Statute Law 23 Custom 25 Case Law 26 Nature of Equity 32

3y ago

285 Views

A Textbook Linear Algebra And Optimization For Machine Learning

It looks like you're using an ad-blocker