Predictive Learning From Data -

1m ago
633.19 KB
53 Pages
Last View : 1m ago
Last Download : n/a
Upload by : Jamie Paz

Predictive Learningfrom DataLECTURE SET 2Problem Setting, Basic LearningProblems and Inductive PrinciplesElectrical and Computer Engineering1

OUTLINE2.0 Objectives Background- formalization of inductive learning- classical statistics vs predictive approach2.1 Terminology and Learning Problems2.2 Basic Learning Methods and ComplexityControl2.3 Inductive Principles2.4 Alternative Learning Formulations2.5 Summary2

2.0 Objectives- To quantify the notions of explanation,prediction and model- Introduce terminology- Describe common learning problems Past observations data pointsExplanation (model) functionLearning function estimation (from data)Prediction using the model to predict new inputs3

Example: classification problemtraining samples, modelGoal 1: explain training data min training errorGoal 2: generalization (for future data) Learning (model estimation) is ill-posed4

Mathematical formalization Learning machine predictive systemGeneratorof samplesxˆyLearningMachineySystem Unknown joint distribution P(x,y)Set of functions (possible models) f (x, )Pre-specified Loss function L(y, f (x, ))(by convention, non-negative Loss )5

Inductive Learning Setting The learning machine observes samples (x ,y), andreturns an estimated response yˆ f (x, w) Two types of inference: identification vs imitation Risk Loss(y, f(x,w)) dP(x,y) minGeneratorof y6

Two Views of Empirical Inference Two approaches to empirical or statistical inferenceEMPIRICAL DATAKNOWLEDGE,ASSUMPTIONSSTATISTICAL INFERENCEPROBABILISTICMODELING RISK-MINIMIZATIONAPPROACHThese two approaches are different both technically andconceptually7

Classical Approaches to Inductive InferenceGeneric problem: finite data Model(1) Classical Science hypothesis testingexperimental data is generated by a given model(single function scientific theory)(2) Classical statistics max likelihood data generated by a parametric model for density.Note: loss fct likelihood (not problem-specific) The same solution approach for all types of problemsR. Fisher: “uncertain inferences” from finite datasee: R. Fisher (1935), The Logic of Inductive Inference, J. Royal StatisticalSociety, available at

Discussion Math formulation useful for quantifying- explanation fitting error (training data)- generalization prediction errorNatural assumptions- future similar to past: stationary P(x,y), discrepancy measure or loss function,i.e. mean squared error (MSE)What if these assumptions do not hold?9

OUTLINE2.0 Objectives2.1 Terminology and Learning Problems- supervised/ unsupervised- classification- regression etc.2.2 Basic Learning Methods and ComplexityControl2.3 Inductive Principles2.4 Alternative Learning Formulations2.5 Summary10

Supervised Learning: Regression Data in the form (x,y), where- x is multivariate input (i.e. vector)- y is univariate output (‘response’)2()Regression: y is real-valued L( y, f (x)) y f (x) Estimation of real-valued function11

Regression Estimation ProblemGiven: training data (xi , yi ), i 1,2,.nFind a function f (x, w ) that minimizes squarederror for a large number (N) of future samples:N k 1[( y k f (x k , w)]2 min2 dP(x,y) min(y f(x,w)) BUT future data is unknown P(x,y) unknown All estimation problems are ill-posed12

Supervised Learning: Classification Data in the form (x,y), where- x is multivariate input (i.e. vector)- y is univariate output (‘response’)Classification: y is categorical (class label) 0 if y f (x )L( y, f (x )) 1 if y f (x ) Estimation of indicator function13

Density Estimation Data in the form (x), where- x is multivariate input (feature vector)Parametric form of density is given: f (x, )The loss function is likelihood or, morecommon, the negative log-likelihoodL( f (x, )) ln f (x, ) The goal of learning is minimization ofR( ) lnf (x, ) p (x )dxfrom finite training data, yielding f (x, 0 )14

Unsupervised Learning 1 Data in the form (x), where- x is multivariate input (i.e. feature vector)Goal: data reduction or clustering Clustering estimation of mapping X C,2L(x,f(x)) x f(x)where C c1, c2 ,.,c m and15

Unsupervised Learning 2 Data in the form (x), where- x is multivariate input (i.e. vector)Goal: dimensionality reduction Mapping f (x) is projection of the data ontolow-dimensional subspace, minimizing lossL(x, f (x)) x f (x)216

OUTLINE2.0 Objectives2.1 Terminology and Learning Problems2.2 Basic Learning Methods andComplexity Control- Parametric modeling- Non-parametric modeling- Data reduction- Complexity control2.3 Inductive Principles2.4 Alternative Learning Formulations2.5 Summary17

Basic learning methodsGeneral idea Specify a wide set of possible models f (x, )where is an abstract set of ‘parameters’*Estimate model parameters by minimizinggiven loss function for training data ( ERM)Learning methods differ in Chosen parameterizationLoss function usedOptimization method used for parameterestimation18

Parametric Modeling ( ERM)Given training data (xi , yi ), i 1,2,.n(1) Specify parametric model(2) Estimate its parameters (via fitting to data) Example: Linear regression F(x) (w x) bn21Remp ( w, b) yi ( w xi ) b minn i 119

Parametric Modeling: classificationGiven training data(xi , yi ), i 1,2,.n(1) Specify parametric model(2) Estimate its parameters (via fitting to data)Example: univariate classification data set(a) Linear decision boundary(b) third-order polynomialf ( x) sign (x b )(f ( x) sign x wx b220)

Parametric Methods in Classical Statistics Learning density estimation, i.i.d. data Maximum Likelihood inductive principle:Given n training samples X, find w* maximizingP data model P (X w ) p(x i ; w )ni 1equivalently, minimize negative log-likelihood-See textbook, Section 2.2, for example:Estimate two parameters of normal distributionfrom i.i.d. data samples via max likelihood empirical mean and empirical variance)21

Maximum Likelihood (cont’d) Similar approach for regression for knownparametric distribution (normal noise) maximizing likelihood min squared loss Similar approach for classification: forknown class distributions (Gaussian)maximizing likelihood second-orderdecision boundaryGeneral approach: (statistical decision theory) Start with parametric form of a distribution Estimate its parameters via max likelihood Use estimated distributions for makingdecision (prediction)22

Non-Parametric ModelingGiven training data(xi , yi ), i 1,2,.nEstimate the model (for given x 0) as‘local average’ of the data.Note: need to define ‘local’, ‘average’ Example: k-nearest neighbors regressionkf (x 0 ) yj 1jk23

Data Reduction ApproachGiven training data, estimate the model as ‘compactencoding’ of the data.Note: ‘compact’ # of bits to encode the modelor# of bits to encode the data (MDL) Example: piece-wise linear regressionHow many parameters neededfor two-linear-component model?24

Data Reduction Approach (cont’d)Data Reduction approaches are commonly usedfor unsupervised learning tasks. Example: clustering.Training data encoded by 3 points (cluster centers)HIssues:- How to find centers?- How to select thenumber of clusters?25

Diverse terminology (for learning methods) Many methods differ in parameterizationof admissible models or approximatingfunctions yˆ f (x, w)- neural networks- decision trees- signal processing ( wavelets) How training samples are used:Batch methodsOn-line or flow-through methods26

Motivation for Complexity ControlEffect of model control on generalization(a) Classification(b) Regression27

Complexity Control: parametric modelingConsider regression estimation Ten training samplesy x 2 N (0, 2 ), where 2 0.25 Fitting linear and 2-nd order polynomial:28

Complexity Control: local estimationConsider regression estimation Ten training samples fromy x 2 N (0, 2 ), where 2 0.25 Using k-nn regression with k 1 and k 4:29

Complexity Control (summary) Complexity (of admissible models) affectsgeneralization (for future data) Specific complexity indices for– Parametric models: # of parameters– Local modeling: size of local region– Data reduction: # of clusters Complexity control choosing optimalcomplexity ( good generalization) forgiven (training) data set not well-understood in classical statistics30

OUTLINE2.0 Objectives2.1 Terminology and Learning Problems2.2 Basic Learning Methods andComplexity Control2.3 Inductive Principles- Motivation- Inductive Principles: Penalization,SRM, Bayesian Inference, MDL2.4 Alternative Learning Formulations2.5 Summary31

Conceptual Motivation Generalization from finite data requires:a priori knowledge any info outsidetraining data, e.g. ?inductive principle general strategies forcombining a priori knowledge and datalearning method constructiveimplementation of inductive principle Example: Empirical Risk Minimization parametric modeling approachQuestion: what are possible limitations of ERM?32

Motivation (cont’d) Need for flexible (adaptive) methods- wide ( flexible) parameterization ill-posed estimation problems- need provisions for complexity control Inductive Principles originate fromstatistics, applied math, info theory,learning theory – and they adopt distinctlydifferent terminology & concepts33

Inductive Principles Inductive Principles differ in terms of- representation of a priori knowledge- mechanism for combining a prioriknowledge with training data- applicability when the true model doesnot belong to admissible models- availability of constructive procedures(learning methods/ algorithms)Note: usually prior knowledge about parameterization34

PENALIZATION Overcomes the limitations of ERMPenalized empirical risk functionalR pen ( ) Remp ( ) f (x, ) f (x, ) is non-negative penalty functionalspecified a priori (independent of the data); itslarger values penalize complex regularization parameter (non-negativenumber) tuned to training dataExample: ridge regression 35

Structural Risk Minimization Overcomes the limitations of ERMComplexity ordering on a set of admissiblemodels, as a nested structureS0 S1 S2 .Examples: a set of polynomial models, Fourierexpansion etc. Goal of learning minimization of empiricalrisk for an optimally selected elementSk36

Bayesian Inference Probabilistic approach to inference Explicitly defines a priori knowledge as priorprobability (distribution) on a set of modelparametersBayes formula for updating prior probabilityusing the evidence given by training data: P data model P model P model data P data P model data posterior probabilityP data model likelihood (probability that the data aregenerated by a model)37

Bayesian Density Estimation Consider parametric density estimation whereprior probability distribution P model p(w)Given training data X, the posterior probabilitydistribution is updatedP(X w ) p(w )p(w X ) P(X )P model data P model 0w38

Implementation of Bayesian Inference Maximum Likelihood,i.e. choose w* maximizingP data model P (X w ) p(x i ; w )ni 1 (equivalent to ERM)True Bayesian inference (averaging) (x X ) p(x ; w ) p(w X )dwWhere p(x; w ) is a set of admissible densitiesP(X w ) p(w )andp(w X ) P(X )39

Minimum Description Length (MDL) Information-theoretic approach- any training data set can be optimally encoded- code length generalization capabilityRelated to the Data Reduction approachintroduced (informally) earlier.Two possible implementations:- lossy encoding- lossless encoding of the data (as in MDL)40

Binary Classification under MDL Consider training data setX {xk,yk} (k 1,2,.n) where y {0,1}Given data object X {x1,., xn} is a binary stringy1,.,yn random?if there is a dependency then the output stringcan be encoded by a shorter code:- the model having code length L (model)- the error term L( data model) the total length of such a code for string y is:b L (model) L( data model)and the compression coefficient is K b / n41

Comparison of Inductive Principles Representation of a priori knowledge/ complexity:penalty term, structure, prior distribution,codebookFormal procedure for complexity control:penalized risk, optimal element of a structure,posterior distributionConstructive implementation of complexity control:resampling, analytic bounds, marginalization,minimum code length***See Table 2.1 in [Cherkassky & Mulier, 2007]***42

OUTLINE2.0 Objectives2.1 Terminology and Learning Problems2.2 Basic Learning Methods and ComplexityControl2.3 Inductive Principles2.4 Alternative Learning Formulations- Motivation- Examples of non-standard formulations- Formalization of application domain2.5 Summary43

Motivation Estimation of predictive modelStep 1: Problem specification/ FormalizationStep 2: Model estimation, learning, inference Standard Inductive Formulation- usually assumed in all ML algorithms- certainly may not be the best formalization forgiven application problem44

Standard Supervised LearningGeneratorof samplesxLearningMachineSystem f(x.w)LossL(f(x,w),y)yAvailable (training) data format (x,y)Test samples (x-values) are unknownStationary distribution, i.i.d samplesSingle model needs to be estimatedSpecific loss functions adopted for common tasks(classification, regression etc.)45

Non-standard Learning Settings Available Data Format- x-values of test samples are knownduring training Transduction, semi-supervised learningDifferent (non-standard) Loss Function- see later example ‘learning the sign of afunction’Univariate Output ( a single model)- multiple outputs may need to beestimated from available data46

Transduction predicting function values at given points:Given labeled training set x-values of test dataEstimate (predict) y-values for given test inputsa priori ontrainingdatatransductionpredictedoutput47

Learning sign of a function Given training data (xi , yi ), i 1,2,.ny [ 2, 2]with y-values in a bounded rangeEstimate function f (x) predicting sign of yLoss function L( y, f ( x)) yf (x)If prediction is wrong real-valued loss yIf prediction is correct real-valued gain y Neither standard regression, nor classificationPractical application: frequent trading48

Multiple Model Estimation Training data in the form (x,y), where- x is multivariate input- y is univariate real-valued output (‘response’)Similar to standard regression, but subsets ofdata may be described by different models49

Formalization of Application Problems Problem Specification Step cannot be formalizedBut Several guidelines can be helpful duringformalization process Mapping process:Application requirements Learning formulationSpecific components of this mapping processare shown next50

APPLICATIONLossFunctionNEEDSInput, output,other variablesTraining/test dataAdmissibleModelsFORMAL PROBLEM STATEMENTLEARNING THEORY51

Summary Standard Inductive Learning function estimation Goal of learning (empirical inference):to act/perform well, not system identification Important concepts:- training data, test data- loss function, prediction error ( prediction risk)- basic learning problems Complexity control Inductive principles – which one is the ‘best’ ?52

Summary (cont’d) Assumptions for inductive learning Non-standard learning formulationsAside: predictive modeling ofphysical systems vs social systemsNote: main assumption (stationarity) does not hold insocial systems (business data, financial data etc.) For discussion think of example application thatrequires non-standard learning formulationNote: (a) do not use examples similar to onespresented in my lectures and/or text book(b) you can email your example to instructor(maximum half-a-page)53

R. Fisher: "uncertain inferences" from finite data see: R. Fisher (1935), The Logic of Inductive Inference, . - Parametric modeling - Non-parametric modeling - Data reduction - Complexity control 2.3 Inductive Principles 2.4 Alternative Learning Formulations 2.5 Summary. 18

Related Documents:

predictive analytics and predictive models. Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events. When most lay people discuss predictive analytics, they are usually .

extant literature on predictive analytics with social media data. First, we discuss the dif-ference between predictive vs. explanatory models and the scientific purposes for and advantages of predictive models. Second, we present and discuss the foundational statisti-cal issues in predictive modelling in general with an emphasis on social media .

Machine learning and predictive analytics: architecture and concepts Embedded predictive models in SAP S/4HANA PROCESS THE ALGORITHMS WHERE THE DATA IS: LOW TCO & OPTIMAL PERFORMANCE LEAD BACK PREDICTIVE ANALYTICS TO CDS VIEWS: CONTENT & CONCEPT REUSE SAP S/4HANA SAP HANA Analytical Engine S s ISLM Repository Modeling & Administration SAS OData .

Predictive maintenance is a bit of hype these days. It is being proclaimed as the ‘killer app’ for the Internet of Things. Machine learning and predictive analytics - the main technologies that enable predictive maintenance - are nearing the ‘Peak of Inflated Expectations’ in Gartner’s Hype Cycle. At the same time, Google Trend data

PREDICTIVE MAINTENANCE BASED ON VIBRATIONS. This White Paper aims to discuss the benefits of connecting the Internet of things (IoT) with . Machine Learning and Predictive Analysis. The Predictive Maintenance performed as a result of this, improves the way

The predictive cruise control (PCC) concept proposed in this brief utilizes the adaptive cruise control function in a predictive manner to simultaneously improve fuel economy and reduce signal wait time. The proposed predictive speed control mode differs from current adaptive cruise control systems in that be-

limits requires a control tool such as MPC, which will better handle the varying set of active constraints. 1.1 GPC (Generalized Predictive Control) Controller The GPC (generalized predictive control) algorithm is a long-range predictive controller using the input-output internal model from Eq. (1) to have knowledge of the process in question [2].

Predictive Tools The Oracle In-DB predictive tools in Alteryx have been designed to function in much the same way as the normal (“non-D”) predictive tools. However, there are a couple particularly important differences that you should be aware of. Background: In general, the Alteryx

Predictive Maintenance Pipeline Leak and Corrosion Detection Compressor/Valve Predictive Analytics Pump Conditio n Monitoring and Predictive Maintenance Renewable Energy Output Forecasting Wind TurbineOptimization anf Predictive Msintenance Mining Equipment Tracking And Asset Optimization Lo

organization. Upon reading this paper, you should be able to get started crafting a predictive analytics program and choosing partners who can ensure your success. PREDICTIVE ANALYTICS PRESENTS IMPORTANT USE CASES DRIVING COSTS DOWN AND QUALITY UP Healthcare presents the perfect storm for predictive analytics. The digitalization of the clinical

known industrial uses such as predictive maintenance. This is perhaps not surprising, given that predictive maintenance was one of the ten use cases that drove the first wave of growth in IoT. In fact, the global predictive maintenance ma

Predictive Maintenance Predictive maintenance lets you estimate time-to-failure of a machine. Knowing the predicted failure time helps you find the optimum time to schedule maintenance for your equipment. Predictive maintenance not only predicts a fu-ture failure, but also pinpoints problems in your complex

Predictive Analytics and Big Data Chapter 4 explores what predictive analytics is and how it lends itself to getting real value out of Big Data for businesses. You get a look at the predictive analytics tools within the Alteryx Designer Desktop. Chapter 5: Humanizing Big Data Here, I

Resources: predictive modeling with networked data Here is a non-exhaustive list of resources to explore work on predictive modeling with networked data. Beyond providing overviews & details, and identifying particular research projects, these resources give a flavor for the variety of topics, and a sampling of the researchers working on them.

in big data predictive analytics, and enable practitioners to understand the potentials and applications of this new and important concept. Keywords: Big data, Predictive analytics, Business Intelligence, Systematic Review 1. INTRODUCTION Organizations are witnessing a rapid growth in the volume of

prebuilt predictive applications, such as the Alteryx Analytics Gallery and the Azure Machine Learning Marketplace. The Predictive analytics Life Cycle starts with Great Questions Predictive analytics uses algorithms to find patterns i

reliable, risk information while creating a risk monitoring process that is modern, effective, and self-evolving. Data collection Applies Robotics Process Automation (RPA) Seeing the storm ahead Predictive Risk Intelligence Predictive risk data

RevoScaleR system to scale predictive analytics to millions of records) for either: (1) using the XDF file as input to a predictive analytics tool or (2) reading the file into an Alteryx data stream for further data hygiene or blending activities This can be used when building and running predictive analytics procedures on large amounts of

“Predictive maintenance can increase aircraft availability by up to 35%”, –Luiz Hamilton Lima, vice president of services and support at Embraer Adopting predictive maintenance through the use of data analysis can reduce maintenance budgets by 30-40%, reports claim. Sizing the

Excerpt from "Predictive Analytics“ by Eric Siegel: In 2010, I invited an expert at Target, Andrew Pole, to keynote at Predictive Analytics World, a conference for which I serve as program chair. Pole manages dozens of analytics professionals who run various predictive analytics (PA) projects at Target. In October of that year,