Foundations Of Machine Learning

2y ago

9 Views

3 Downloads

3.39 MB

427 Pages

Last View : 4m ago

Last Download : 3m ago

Upload by : Kamden Hassan

Report this link

Download PDF

Transcription

Foundations of Machine Learning

Adaptive Computation and Machine LearningThomas Dietterich, EditorChristopher Bishop, David Heckerman, Michael Jordan, and Michael Kearns,Associate EditorsA complete list of books published in The Adaptive Computations and MachineLearning series appears at the back of this book.

Foundations of Machine LearningMehryar Mohri, Afshin Rostamizadeh, and Ameet TalwalkarThe MIT PressCambridge, MassachusettsLondon, England

c 2012 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by anyelectronic or mechanical means (including photocopying, recording, or informationstorage and retrieval) without permission in writing from the publisher.MIT Press books may be purchased at special quantity discounts for business orsales promotional use. For information, please email special sales@mitpress.mit.eduor write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142.This book was set in LATEX by the authors. Printed and bound in the United Statesof America.Library of Congress Cataloging-in-Publication DataMohri, Mehryar.Foundations of machine learning / Mehryar Mohri, Afshin Rostamizadeh, andAmeet Talwalkar.p. cm. - (Adaptive computation and machine learning series)Includes bibliographical references and index.ISBN 978-0-262-01825-8 (hardcover : alk. paper) 1. Machine learning. 2. Computeralgorithms. I. Rostamizadeh, Afshin. II. Talwalkar, Ameet. III. Title.Q325.5.M64 2012006.3’1-dc23201200724910 9 8 7 6 5 4 3 2 1

ContentsPrefacexi1 Introduction1.1 Applications and problems .1.2 Deﬁnitions and terminology1.3 Cross-validation . . . . . . .1.4 Learning scenarios . . . . .1.5 Outline . . . . . . . . . . .113578PAC Learning FrameworkThe PAC learning model . . . . . . . . . . . . . . . . . .Guarantees for ﬁnite hypothesis sets — consistent case .Guarantees for ﬁnite hypothesis sets — inconsistent caseGeneralities . . . . . . . . . . . . . . . . . . . . . . . . .2.4.1 Deterministic versus stochastic scenarios . . . . .2.4.2 Bayes error and noise . . . . . . . . . . . . . . .2.4.3 Estimation and approximation errors . . . . . . .2.4.4 Model selection . . . . . . . . . . . . . . . . . . .Chapter notes . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .1111172124242526272829.333438414854554 Support Vector Machines4.1 Linear classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2 SVMs — separable case . . . . . . . . . . . . . . . . . . . . . . . . .6363642 The2.12.22.32.42.52.63 Rademacher Complexity and3.1 Rademacher complexity . .3.2 Growth function . . . . . .3.3 VC-dimension . . . . . . . .3.4 Lower bounds . . . . . . . .3.5 Chapter notes . . . . . . . .3.6 Exercises . . . . . . . . . .VC-Dimension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi4.34.44.54.64.2.1 Primal optimization problem4.2.2 Support vectors . . . . . . . .4.2.3 Dual optimization problem .4.2.4 Leave-one-out analysis . . . .SVMs — non-separable case . . . . .4.3.1 Primal optimization problem4.3.2 Support vectors . . . . . . . .4.3.3 Dual optimization problem .Margin theory . . . . . . . . . . . . .Chapter notes . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . .5 Kernel Methods5.1 Introduction . . . . . . . . . . . . . . . .5.2 Positive deﬁnite symmetric kernels . . .5.2.1 Deﬁnitions . . . . . . . . . . . .5.2.2 Reproducing kernel Hilbert space5.2.3 Properties . . . . . . . . . . . . .5.3 Kernel-based algorithms . . . . . . . . .5.3.1 SVMs with PDS kernels . . . . .5.3.2 Representer theorem . . . . . . .5.3.3 Learning guarantees . . . . . . .5.4 Negative deﬁnite symmetric kernels . . .5.5 Sequence kernels . . . . . . . . . . . . .5.5.1 Weighted transducers . . . . . .5.5.2 Rational kernels . . . . . . . . .5.6 Chapter notes . . . . . . . . . . . . . . .5.7 Exercises . . . . . . . . . . . . . . . . .6 Boosting6.1 Introduction . . . . . . . . . . . . . . . . . .6.2 AdaBoost . . . . . . . . . . . . . . . . . . .6.2.1 Bound on the empirical error . . . .6.2.2 Relationship with coordinate descent6.2.3 Relationship with logistic regression6.2.4 Standard use in practice . . . . . . .6.3 Theoretical results . . . . . . . . . . . . . .6.3.1 VC-dimension-based analysis . . . .6.3.2 Margin-based analysis . . . . . . . .6.3.3 Margin maximization . . . . . . . .6.3.4 Game-theoretic interpretation . . . 137

vii6.46.56.6Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1427 On-Line Learning7.1 Introduction . . . . . . . . . . . . . . . . . . . . .7.2 Prediction with expert advice . . . . . . . . . . .7.2.1 Mistake bounds and Halving algorithm .7.2.2 Weighted majority algorithm . . . . . . .7.2.3 Randomized weighted majority algorithm7.2.4 Exponential weighted average algorithm .7.3 Linear classiﬁcation . . . . . . . . . . . . . . . . .7.3.1 Perceptron algorithm . . . . . . . . . . . .7.3.2 Winnow algorithm . . . . . . . . . . . . .7.4 On-line to batch conversion . . . . . . . . . . . .7.5 Game-theoretic connection . . . . . . . . . . . . .7.6 Chapter notes . . . . . . . . . . . . . . . . . . . .7.7 Exercises . . . . . . . . . . . . . . . . . . . . . .1471471481481501521561591601681711741751768 Multi-Class Classiﬁcation8.1 Multi-class classiﬁcation problem . . .8.2 Generalization bounds . . . . . . . . .8.3 Uncombined multi-class algorithms . .8.3.1 Multi-class SVMs . . . . . . . .8.3.2 Multi-class boosting algorithms8.3.3 Decision trees . . . . . . . . . .8.4 Aggregated multi-class algorithms . .8.4.1 One-versus-all . . . . . . . . . .8.4.2 One-versus-one . . . . . . . . .8.4.3 Error-correction codes . . . . .8.5 Structured prediction algorithms . . .8.6 Chapter notes . . . . . . . . . . . . . .8.7 Exercises . . . . . . . . . . . . . . . .1831831851911911921941981981992012032062079 Ranking9.1 The problem of ranking . . . . . . . . . . .9.2 Generalization bound . . . . . . . . . . . .9.3 Ranking with SVMs . . . . . . . . . . . . .9.4 RankBoost . . . . . . . . . . . . . . . . . .9.4.1 Bound on the empirical error . . . .9.4.2 Relationship with coordinate descent.209209211213214216218.

823823924124524524725225726026126226311 Algorithmic Stability11.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2 Stability-based generalization guarantee . . . . . . . . . . .11.3 Stability of kernel-based regularization algorithms . . . . .11.3.1 Application to regression algorithms: SVR and KRR11.3.2 Application to classiﬁcation algorithms: SVMs . . .11.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . .11.4 Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . .11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .2672672682702742762762772779.59.69.79.89.99.4.3 Margin bound for ensemble methods in rankingBipartite ranking . . . . . . . . . . . . . . . . . . . . .9.5.1 Boosting in bipartite ranking . . . . . . . . . .9.5.2 Area under the ROC curve . . . . . . . . . . .Preference-based setting . . . . . . . . . . . . . . . . .9.6.1 Second-stage ranking problem . . . . . . . . . .9.6.2 Deterministic algorithm . . . . . . . . . . . . .9.6.3 Randomized algorithm . . . . . . . . . . . . . .9.6.4 Extension to other loss functions . . . . . . . .Discussion . . . . . . . . . . . . . . . . . . . . . . . . .Chapter notes . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . .10 Regression10.1 The problem of regression . . . . . . . . .10.2 Generalization bounds . . . . . . . . . . .10.2.1 Finite hypothesis sets . . . . . . .10.2.2 Rademacher complexity bounds . .10.2.3 Pseudo-dimension bounds . . . . .10.3 Regression algorithms . . . . . . . . . . .10.3.1 Linear regression . . . . . . . . . .10.3.2 Kernel ridge regression . . . . . . .10.3.3 Support vector regression . . . . .10.3.4 Lasso . . . . . . . . . . . . . . . .10.3.5 Group norm regression algorithms10.3.6 On-line regression algorithms . . .10.4 Chapter notes . . . . . . . . . . . . . . . .10.5 Exercises . . . . . . . . . . . . . . . . . .12 Dimensionality Reduction28112.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 282

ix12.2 Kernel Principal Component Analysis (KPCA)12.3 KPCA and manifold learning . . . . . . . . . .12.3.1 Isomap . . . . . . . . . . . . . . . . . .12.3.2 Laplacian eigenmaps . . . . . . . . . . .12.3.3 Locally linear embedding (LLE) . . . .12.4 Johnson-Lindenstrauss lemma . . . . . . . . . .12.5 Chapter notes . . . . . . . . . . . . . . . . . . .12.6 Exercises . . . . . . . . . . . . . . . . . . . . .28328528528628728829029013 Learning Automata and Languages13.1 Introduction . . . . . . . . . . . . . . . .13.2 Finite automata . . . . . . . . . . . . .13.3 Eﬃcient exact learning . . . . . . . . . .13.3.1 Passive learning . . . . . . . . .13.3.2 Learning with queries . . . . . .13.3.3 Learning automata with queries13.4 Identiﬁcation in the limit . . . . . . . .13.4.1 Learning reversible automata . .13.5 Chapter notes . . . . . . . . . . . . . . .13.6 Exercises . . . . . . . . . . . . . . . . .29329329429529629729830330430931014 Reinforcement Learning14.1 Learning scenario . . . . . . . . .14.2 Markov decision process model .14.3 Policy . . . . . . . . . . . . . . .14.3.1 Deﬁnition . . . . . . . . .14.3.2 Policy value . . . . . . . .14.3.3 Policy evaluation . . . . .14.3.4 Optimal policy . . . . . .14.4 Planning algorithms . . . . . . .14.4.1 Value iteration . . . . . .14.4.2 Policy iteration . . . . . .14.4.3 Linear programming . . .14.5 Learning algorithms . . . . . . .14.5.1 Stochastic approximation14.5.2 TD(0) algorithm . . . . .14.5.3 Q-learning algorithm . . .14.5.4 SARSA . . . . . . . . . .14.5.5 TD(λ) algorithm . . . . .14.5.6 Large state space . . . . .14.6 Chapter notes . . . . . . . . . . 34335336337.

xConclusion339A Linear Algebra ReviewA.1 Vectors and norms . . . . . . . . . . .A.1.1 Norms . . . . . . . . . . . . . .A.1.2 Dual norms . . . . . . . . . . .A.2 Matrices . . . . . . . . . . . . . . . . .A.2.1 Matrix norms . . . . . . . . . .A.2.2 Singular value decomposition .A.2.3 Symmetric positive semideﬁniteB Convex OptimizationB.1 Diﬀerentiation and unconstrainedB.2 Convexity . . . . . . . . . . . . .B.3 Constrained optimization . . . .B.4 Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(SPSD) matricesoptimization. . . . . . . . . . . . . . . . . . . . . 9361363365.369369371373374374374376377C Probability ReviewC.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . .C.2 Random variables . . . . . . . . . . . . . . . . . . . . . . .C.3 Conditional probability and independence . . . . . . . . .C.4 Expectation, Markov’s inequality, and moment-generatingC.5 Variance and Chebyshev’s inequality . . . . . . . . . . . .D Concentration inequalitiesD.1 Hoeﬀding’s inequality . . . . . . . . . . . . . .D.2 McDiarmid’s inequality . . . . . . . . . . . . .D.3 Other inequalities . . . . . . . . . . . . . . . . .D.3.1 Binomial distribution: Slud’s inequalityD.3.2 Normal distribution: tail bound . . . . .D.3.3 Khintchine-Kahane inequality . . . . . .D.4 Chapter notes . . . . . . . . . . . . . . . . . . .D.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .function. . . . .E Notation379References381Index397

PrefaceThis book is a general introduction to machine learning that can serve as a textbookfor students and researchers in the ﬁeld. It covers fundamental modern topics inmachine learning while providing the theoretical basis and conceptual tools neededfor the discussion and justiﬁcation of algorithms. It also describes several key aspectsof the application of these algorithms.We have aimed to present the most novel theoretical tools and concepts whilegiving concise proofs, even for relatively advanced results. In general, wheneverpossible, we have chosen to favor succinctness. Nevertheless, we discuss some crucialcomplex topics arising in machine learning and highlight several open researchquestions. Certain topics often merged with others or treated with insuﬃcientattention are discussed separately here and with more emphasis: for example, adiﬀerent chapter is reserved for multi-class classiﬁcation, ranking, and regression.Although we cover a very wide variety of important topics in machine learning, wehave chosen to omit a few important ones, including graphical models and neuralnetworks, both for the sake of brevity and because of the current lack of solidtheoretical guarantees for some methods.The book is intended for students and researchers in machine learning, statisticsand other related areas. It can be used as a textbook for both graduate and advancedundergraduate classes in machine learning or as a reference text for a researchseminar. The ﬁrst three chapters of the book lay the theoretical foundation for thesubsequent material. Other chapters are mostly self-contained, with the exceptionof chapter 5 which introduces some concepts that are extensively used in laterones. Each chapter concludes with a series of exercises, with full solutions presentedseparately.The reader is assumed to be familiar with basic concepts in linear algebra,probability, and analysis of algorithms. However, to further help him, we presentin the appendix a concise linear algebra and a probability review, and a shortintroduction to convex optimization. We have also collected in the appendix anumber of useful tools for concentration bounds used in this book.To our knowledge, there is no single textbook covering all of the materialpresented here. The need for a uniﬁed presentation has been pointed out to us

xiiPrefaceevery year by our machine learning students. There are several good books forvarious specialized areas, but these books do not include a discussion of otherfundamental topics in a general manner. For example, books about kernel methodsdo not include a discussion of other fundamental topics such as boosting, ranking,reinforcement learning, learning automata or online learning. There also exist moregeneral machine learning books, but the theoretical foundation of our book and ouremphasis on proofs make our presentation quite distinct.Most of the material presented here takes its origins in a machine learninggraduate course (Foundations of Machine Learning) taught by the ﬁrst authorat the Courant Institute of Mathematical Sciences in New York University overthe last seven years. This book has considerably beneﬁted from the commentsand suggestions from students in these classes, along with those of many friends,colleagues and researchers to whom we are deeply indebted.We are particularly grateful to Corinna Cortes and Yishay Mansour who haveboth made a number of key suggestions for the design and organization of thematerial presented with detailed comments that we have fully taken into accountand that have greatly improved the presentation. We are also grateful to YishayMansour for using a preliminary version of the book for teaching and for reportinghis feedback to us.We also thank for discussions, suggested improvement, and contributions of manykinds the following colleagues and friends from academic and corporate research laboratories: Cyril Allauzen, Stephen Boyd, Spencer Greenberg, Lisa Hellerstein, SanjivKumar, Ryan McDonald, Andres Muñoz Medina, Tyler Neylon, Peter Norvig, Fernando Pereira, Maria Pershina, Ashish Rastogi, Michael Riley, Umar Syed, CsabaSzepesvári, Eugene Weinstein, and Jason Weston.Finally, we thank the MIT Press publication team for their help and support inthe development of this text.

1IntroductionMachine learning can be broadly deﬁned as computational methods using experienceto improve performance or to make accurate predictions. Here, experience refers tothe past information available to the learner, which typically takes the form ofelectronic data collected and made available for analysis. This data could be in theform of digitized human-labeled training sets, or other types of information obtainedvia interaction with the environment. In all cases, its quality and size are crucial tothe success of the predictions made by the learner.Machine learning consists of designing eﬃcient and accurate prediction algorithms. As in other areas of computer science, some critical measures of the qualityof these algorithms are their time and space complexity. But, in machine learning,we will need additionally a notion of sample complexity to evaluate the sample sizerequired for the algorithm to learn a family of concepts. More generally, theoretical learning guarantees for an algorithm depend on the complexity of the conceptclasses considered and the size of the training sample.Since the success of a learning algorithm depends on the data used, machinelearning is inherently related to data analysis and statistics. More generally, learningtechniques are data-driven methods combining fundamental concepts in computerscience with ideas from statistics, probability and optimization.1.1Applications and problemsLearning algorithms have been successfully deployed in a variety of applications,includingText or document classiﬁcation, e.g., spam detection;Natural language processing, e.g., morphological analysis, part-of-speech tagging,statistical parsing, named-entity recognition;Speech recognition, speech synthesis, speaker veriﬁcation;Optical character recognition (OCR);Computational biology applications, e.g., protein function or structured predic-

2Introductiontion;Computer vision tasks, e.g., image recognition, face detection;Fraud detection (credit card, telephone) and network intrusion;Games, e.g., chess, backgammon;Unassisted vehicle control (robots, navigation);Medical diagnosis;Recommendation systems, search engines, information extraction systems.This list is by no means comprehensive, and learning algorithms are applied to newapplications every day. Moreover, such applications correspond to a wide variety oflearning problems. Some major classes of learning problems are:Classiﬁcation: Assign a category to each item. For example, document classiﬁcation may assign items with categories such as politics, business, sports, or weatherwhile image classiﬁcation may assign items with categories such as landscape, portrait, or animal. The number of categories in such tasks is often relatively small,but can be large in some diﬃcult tasks and even unbounded as in OCR, text classiﬁcation, or speech recognition.Regression: Predict a real value for each item. Examples of regression includeprediction of stock values or variations of economic variables. In this problem, thepenalty for an incorrect prediction depends on the magnitude of the diﬀerencebetween the true and predicted values, in contrast with the classiﬁcation problem,where there is typically no notion of closeness between various categories.Ranking: Order items according to some criterion. Web search, e.g., returningweb pages relevant to a search query, is the canonical ranking example. Many othersimilar ranking problems arise in the context of the design of information extractionor natural language processing systems.Clustering: Partition items into homogeneous regions. Clustering is often performed to analyze very large data sets. For example, in the context of social network analysis, clustering algorithms attempt to identify “communities” within largegroups of people.Dimensionality reduction or manifold learning: Transform an initial representation of items into a lower-dimensional representation of these items while preservingsome properties of the initial representation. A common example involves preprocessing digital images in computer vision tasks.The main practical objectives of machine learning consist of generating accuratepredictions for unseen items and of designing eﬃcient and robust algorithms toproduce these predictions, even for large-scale problems. To do so, a number ofalgorithmic and theoretical questions arise. Some fundamental questions include:

1.2Deﬁnitions and terminology3Figure 1.1The zig-zag line on the left panel is consistent over the blue and redtraining sample, but it is a complex separation surface that is not likely to generalizewell to unseen data. In contrast, the decision surface on the right panel is simplerand might generalize better in spite of its misclassiﬁcation of a few points of thetraining sample.Which concept families can actually be learned, and under what conditions? Howwell can these concepts be learned computationally?1.2Deﬁnitions and terminologyWe will use the canonical problem of spam detection as a running example toillustrate some basic deﬁnitions and to describe the use and evaluation of machinelearning algorithms in practice. Spam detection is the problem of learning toautomatically classify email messages as either spam or non-spam.Examples: Items or instances of data used for learning or evaluation. In our spamproblem, these examples correspond to the collection of email messages we will usefor learning and testing.Features: The set of attributes, often represented as a vector, associated to anexample. In the case of email messages, some relevant features may include thelength of the message, the name of the sender, various characteristics of the header,the presence of certain keywords in the body of the message, and so on.Labels: Values or categories assigned to examples. In classiﬁcation problems,examples are assigned speciﬁc categories, for instance, the spam and non-spamcategories in our binary classiﬁcation problem. In regression, items are assignedreal-valued labels.Training sample: Examples used to train a learning algorithm. In our spamproblem, the training sample consists of a set of email examples along with theirassociated labels. The training sample varies for diﬀerent learning scenarios, asdescribed in section 1.4.Validation sample: Examples used to tune the parameters of a learning algorithm

4Introductionwhen working with labeled data. Learning algorithms typically have one or morefree parameters, and the validation sample is used to select appropriate values forthese model parameters.Test sample: Examples used to evaluate the performance of a learning algorithm.The test sample is separate from the training and validation data and is not madeavailable in the learning stage. In the spam problem, the test sample consists of acollection of email examples for which the learning algorithm must predict labelsbased on features. These predictions are then compared with the labels of the testsample to measure the performance of the algorithm.Loss function: A function that measures the diﬀerence, or loss, between a predicted label and a true label. Denoting the set of all labels as Y and the set ofpossible predictions as Y , a loss function L is a mapping L : Y Y R . In mostcases, Y Y and the loss function is bounded, but these conditions do not alwayshold. Common examples of loss functions include the zero-one (or misclassiﬁcation)loss deﬁned over { 1, 1} { 1, 1} by L(y, y ) 1y y and the squared lossdeﬁned over I I by L(y, y ) (y y)2 , where I R is typically a boundedinterval.Hypothesis set: A set of functions mapping features (feature vectors) to the set oflabels Y. In our example, these may be a set of functions mapping email featuresto Y {spam, non-spam}. More generally, hypotheses may be functions mappingfeatures to a diﬀerent set Y . They could be linear functions mapping email featurevectors to real numbers interpreted as scores (Y R), with higher score valuesmore indicative of spam than lower ones.We now deﬁne the learning stages of our spam problem. We start with a givencollection of labeled examples. We ﬁrst randomly partition the data into a trainingsample, a validation sample, and a test sample. The size of each of these samplesdepends on a number of diﬀerent considerations. For example, the amount of datareserved for validation depends on the number of free parameters of the algorithm.Also, when the labeled sample is relatively small, the amount of training data isoften chosen to be larger than that of test data since the learning performancedirectly depends on the training sample.Next, we associate relevant features to the examples. This is a critical step inthe design of machine learning solutions. Useful features can eﬀectively guide thelearning algorithm, while poor or uninformative ones can be misleading. Althoughit is critical, to a large extent, the choice of the features is left to the user. Thischoice reﬂects the user’s prior knowledge about the learning task which in practicecan have a dramatic eﬀect on the performance results.Now, we use the features selected to train our learning algorithm by ﬁxing diﬀerentvalues of its free parameters. For each value of these parameters, the algorithm

1.3Cross-validation5selects a diﬀerent hypothesis out of the hypothesis set. We choose among themthe hypothesis resulting in the best performance on the validation sample. Finally,using that hypothesis, we predict the labels of the examples in the test sample. Theperformance of the algorithm is evaluated by using the loss function associated tothe task, e.g., the zero-one loss in our spam detection task, to compare the predictedand true labels.Thus, the performance of an algorithm is of course evaluated based on its test errorand not its error on the training sample. A learning algorithm may be consistent,that is it may commit no error on the examples of the training data, and yethave a poor performance on the test data. This occurs for consistent learnersdeﬁned by very complex decision surfaces, as illustrated in ﬁgure 1.1, which tendto memorize a relatively small training sample instead of seeking to generalize well.This highlights the key distinction between memorization and generalization, whichis the fundamental property sought for an accurate learning algorithm. Theoreticalguarantees for consistent learners will be discussed with great detail in chapter 2.1.3Cross-validationIn practice, the amount of labeled data available is often too small to set asidea validation sample since that would leave an insuﬃcient amount of training data.Instead, a widely adopted method known as n-fold cross-validation is used to exploitthe labeled data both for model selection (selection of the free parameters of thealgorithm) and for training.Let θ denote the vector of free parameters of the algorithm. For a ﬁxed valueof θ, the method consists of ﬁrst randomly partitioning a given sample S ofm labeled examples into n subsamples, or folds. The ith fold is thus a labeledsample ((xi1 , yi1 ), . . . , (ximi , yimi )) of size mi . Then, for any i [1, n], the learningalgorithm is trained on all but the ith fold to generate a hypothesis hi , and theperformance of hi is tested on the ith fold, as illustrated in ﬁgure 1.2a

MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please email special sales@mitpress.mit.edu or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cam-bridge, MA 02142. This book was set in LATEX by the au

Related Documents:

Foundations of Machine Learning - Jeffrey Heinz

Foundations of machine learning / Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. p. cm. - (Adaptive computation and machine learning series) . (Foundations of Machine Learning) taught by the ﬁrst author at the Courant Insti

14 Views

2y ago

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

16 Views

3m ago

Mathematical Methods in Machine Learning - UMD

Machine learning has many different faces. We are interested in these aspects of machine learning which are related to representation theory. However, machine learning has been combined with other areas of mathematics. Statistical machine learning. Topological machine learning. Computer science. Wojciech Czaja Mathematical Methods in Machine .

27 Views

1y ago

Lecture 1: Machine Learning Problem - University of Adelaide

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

36 Views

1y ago

Promoting Preschool Quality Through Effective Classroom Management - MDRC

The Foundations of Learning Approach to Promoting Quality Preschool . Programs 3 . The Foundations of Learning Program Model 4 . Evolution of the Foundations of Learning Demonstration 5 . The Newark Context 8 . Organization and Staffing of the Foundations of Learning Demonstration 11 . Research Questions and Overview of This Report 12

7 Views

8m ago

Machine Learning - B. Supervised Learning: Nonlinear Models B.5. A ...

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL .

16 Views

1y ago

Flock: Hybrid Crowd-Machine Learning Classiﬁers - Stanford University

with machine learning algorithms to support weak areas of a machine-only classiﬁer. Supporting Machine Learning Interactive machine learning systems can speed up model evaluation and helping users quickly discover classiﬁer de-ﬁciencies. Some systems help users choose between multiple machine learning models (e.g., [17]) and tune model .

52 Views

7m ago

Artificial Intelligence, Machine Learning, Deep Learning ...

Artificial Intelligence, Machine Learning, and Deep Learning (AI/ML/DL) F(x) Deep Learning Artificial Intelligence Machine Learning Artificial Intelligence Technique where computer can mimic human behavior Machine Learning Subset of AI techniques which use algorithms to enable machines to learn from data Deep Learning

178 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

AUGUST 11, 2020 Business or Legal Advice? - acc

legal advice to the client. Mixed purpose depending on specific purpose for tax advice. 27 In-House Privilege: Scenario 2: Tax Advice. FACTS An anonymous employee submits a complaint about his boss. In-house counsel interviews various employees about the allegations. Is the

1y ago

136 Views

Foundations Of Machine Learning

It looks like you're using an ad-blocker