Introduction To Statistical Learning

1y ago

10 Views

2 Downloads

1.31 MB

49 Pages

Last View : 12d ago

Last Download : 3m ago

Upload by : Luis Waller

Report this link

Download PDF

Transcription

Introduction to Statistical LearningBin LiIIT Lecture Series1 / 49

What is statistical learningIStatistical learning is the science of learning from the datausing statistical methods.IIIIIIPredict the price of a stock in 6 months from now, on thebasis of company performance measures and economic data.Predict whether a patient, hospitalized due to a heart attack,will have a second attack based on patient’s demographic, dietand clinical measurements.Identify the risk factors for prostate cancer.Given a collection of text documents, we want to organizethem according to their content similarities.Statistical learning plays a key role in data mining, artificialintelligence and machine learning.We can divide all statistical learning problems into supervisedand unsupervised situations.IISupervised learning is where both the predictors, Xi ’s, and theresponse, Yi , are observed (e.g. regression/classification).In unsupervised learning, only Xi ’s are observed (e.g.clustering/market bastet analysis).2 / 49

Handwritten Digit RecognitionIData come from the handwritten ZIP codes on envelopesfrom U.S. postal mail.IEach image is a segment from a five digit ZIP code, isolatinga single digit.IThe images are 16 16 eight-bit graysclae maps, with eachpixel ranging in intensity from 0 to 255.IImages are nomralized to have approximately the same sizeand orientation.ITask: predict from 16 16 matrix of pixel intensities, theidentity of each image (0, 1, . . . , 9).Results:IIIIIISingle layer neural network: 80.0%Two layer network: 87%Constrained neural network: 98.4%Tangent distance with 1-NN: 98.9%Support vector machine: 99.2%3 / 49

Handwritten Digit Recognition (cont.)Figure from EOSL 2009Figure 11.9: Examples of training cases from ZIP codedata. Each image is a 16 16 8-bit grayscale represen-4 / 49

0.150.200.250.30A Recent Project with Dr. Chakraborty5001000150020002500Wavelength (nm)5 / 49

Statistical Science2001, Vol. 16, No. 3, 199–231Statistical Modeling: The Two CulturesLeo BreimanAbstract. There are two cultures in the use of statistical modeling toreach conclusions from data. One assumes that the data are generatedby a given stochastic data model. The other uses algorithmic models andtreats the data mechanism as unknown. The statistical community hasbeen committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has keptstatisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developedrapidly in ﬁelds outside statistics. It can be used both on large complexdata sets and as a more accurate and informative alternative to datamodeling on smaller data sets. If our goal as a ﬁeld is to use data tosolve problems, then we need to move away from exclusive dependenceon data models and adopt a more diverse set of tools.1. INTRODUCTIONStatistics starts with data. Think of the data asbeing generated by a black box in which a vector ofinput variables x (independent variables) go in oneside, and on the other side the response variables ycome out. Inside the black box, nature functions toassociate the predictor variables with the responsevariables, so the picture is like this:The values of the parameters are estimated fromthe data and the model then used for informationand/or prediction. Thus the black box is ﬁlled in likethis:ylinear regressionlogistic regressionCox modelxModel validation. Yes–no using goodness-of-ﬁt 6 / 49

side, and on the other side the response variables yDatatheInsideBlack theBox black box, nature functions tocomeandout.associate the predictor variables with the responsevariables, so the picture is like this:ynaturexThere are two goals in analyzing the data:IPrediction. To be able to predict what the responses arePrediction.Totobeableto variables.predict what the responsesgoing to befutureinputareI goingto informationabout how ariables.aboutInformation. To extract some informationI Twodifferenttowardsthethe abovegoals. o the input variables.There are two different approaches toward thesegoals:7 / 49

RODUCTIONThe values of the parameters are estimated fromTIONthevaluesdata andthe modelthen usedinformationfromof theparametersareforestimatedh data. Think of the data as Theand/orThus theblackusedbox isforﬁlledin likedata prediction.and the modeltheninformationack box ofin thewhicha vectorThinkdataas of thethis:pendentvariables)go inone and/or prediction. Thus the black box is ﬁlled in likexin whicha vectorofIModelingCulture linearfromstatisticians.ide the responseDatavariablesy this:regressiont variables) go in oneyxlogistic regressionack box, nature functions toresponsewithvariablesyCox modellinearregressionvariablesthe responseyxlogistic regressionx,functions toe isnaturelike this:Model validation.Yes–nousing goodness-of-ﬁtCoxmodelles with the responsetests and residual examination.xenaturethis:Estimatedculture population.98% of goodness-of-ﬁtall statistiModelvalidation.Yes–no usingcians.I Start withassuminga stochasticmodel for the black box;testsand residualexamination.n analyzingx the data:Estimatedculturepopulation.98% ofalldata;statistiI ingCulturefromo predict what the responsescians.I Use fittedmodeltodoprediction;The analysis in this culture considers the inside ofre inputyzingthevariables;data:thebox complexand CIunknown.approach is toI Useact some st andto Culturedo Theirinference.ﬁnd a function f x —an algorithm that operates onctwhatresponsestingthe theresponsevariablesx to analysispredictthey. TheirblackboxCSlooksThein responsesthis cultureconsiderstheinsideofI The Algorithmicut :complex and unknown. Their approach is tomeinformationaboutent approaches toward theseﬁnd a function f x —an algorithm that operates onhe response variablesyunknownxTwo Culturesx to predict the responses y. Their black box lookslike this:tureroaches toward theseculture starts with assumingl for the inside of the blackmmon data model is that dataendent draws fromIydecision treesunknownneuralnetsxModel validation. Measured by predictive accuracy.Approximatethe blackby someEstimatedcultureboxpopulation.2% complicatedof statisticians, function;starts withassumingdecision trees(predictorvariables,in other ﬁelds.I Estimate manythe functionfromsomeherandominsidenoise,of theblackneuralnets algorithm;parameters)In this andpaperinformationI will argue thatfocusonin theBoth predictionare thebasedfitted functions;ata model is that Idatastatisticalcommunityon data racy.drawsfromor, Department of Statistics,Estimatedculture population. 2% of statisticians, 8 / 49

Ozone ProjectIPredictors: daily and hourly readings of over 450meteorological variables for a period of seven years.IResponse: hourly values of ozone concentration in the Basin.IObjetive: predict ozone concentration 12 hours in advance.ITraining set: the first five years data. Test set: the last twoyears data.IModel: multiple linear regressions (including quadratic termsand interactions) with variable selection.IResults: A failure. The false alarm rate of the final predictorwas too high.IQ: What are the possible reasons make MLR unsuccessful inOzone project?9 / 49

Chlorine ProjectIPredictors: mass spectrum predictor with molecular weightranges from 30 to over 10,000.IResponse: contains chlorine or not.ITraining set: 25,000 compounds with known chemicalstructure and mass spectra. Test set: 5,000 knowncompounds.IModel: Linear discriminant analysis (LDA), quadraticdiscriminant analysis (QDA) and decision trees.IResults: LDA and QDA were difficult to adapt to the variabledimensionality. Decision tree with 1,500 yes-no questions:success with 95% prediction accuracy.IQ: What are the possible reasons make tree successful inChlorine project?10 / 49

Perceptions on Statistical AnalysisIFocus on finding a good solution, that’s what consultants getpaid for.ILive with the data before you plunge into modeling.ISearch for a model that gives a good solution, eitheralgorithmic or data.IPredictive accuracy on test sets is the criterion for how goodthe model is.IComputers are an indispensable partner. Programming is anecessary skill for statisticians.11 / 49

What research in the university was like?IA friend of Leo Breiman, a prominent statistician from theBerkeley Statistics Department, visited me in Los Angeles inthe late 1970s. After I described the decision tree method tohim, his first question was, “What’s the model for the data?”IIn Annals of Statistics and JASA, almost every articlecontains a statement of the form:Assume that the data are generated by the following model . . .IConsider data modeling as the template for statistical analysis.IThe conclusions are about the model’s but not the nature’smechanism.IIf the model is a poor emulation of nature, the conclusionsmaybe wrong.12 / 49

A Study for Gender DiscriminationA study was done several decades ago by a well-known member ofa university statistics department to assess whether there wasgender discrimination in the salaries of the faculty.All personnel files were examined and a data base set up whichconsisted of salary as the response variable and 25 other variableswhich characterized academic performance. Such as paperspublished, quality of journals published in, teaching record,evaluations, etc.Gender appears as a binary predictor variable.A linear regression was carried out on the data and the gendercoefficient was significant at the 5% level. This was believed asstrong evidence of sex discrimination.13 / 49

A Study for Gender Discrimination (cont.)IIIICan the data gathered answer the questionposed?Is inference justified when your sample is theentire population?Should a data model be used?The deficiencies in analysis occurred because thefocus was on the model and not on the problem.14 / 49

Problems in Current Data ModelingIThe linear regression model led to many erroneous conclusionsthat appeared in journal articles waving the 5% significancelevel without knowing whether the model fit the data.IThe author set up a simulated regression problem in sevendimensions with a controlled amount of nonlinearity. Standardtests of goodness-of-fit (i.e. lack-of-fit test) did not rejectlinearity until the nonlinearity was extreme.IAn acceptable residual plot does not imply that the model is agood fit to the data.IPublished applications to data often show little care inchecking model fit . . . The question of how well the model fitsthe data is of secondary importance compared to theconstruction of an ingenious stochastic model.15 / 49

Limitations of Data ModelingIEnforcing the form of the model in data modeling.IRelatively low prediction accuracy on data generated fromcomplex systems.IOld saying: “If all a man has is a hammer, then every problemlooks like a nail.”IApproaching problems by looking for a data model imposes ana priori straight jacket that restricts the ability of statisticiansto deal with a wide range of statistical problems.ITakeaway message: to solve a wider range of data problems,we need a larger set of tools!16 / 49

Estimating unknown function fYi f (Xi ) iwith E { i } 0,80706050Income2030405040Income607080where f is an unknown function and is a random error.30ISuppose we observe Yi and Xi (Xi1 , Xi2 , . . . , Xip ) fori 1, . . . , n.We believe that there is a relationship between Y and at leastone of the X ’s. So we model the relationship as20I1012141618Years of Education202210121416182022Years of EducationFigure from ISLR 201317 / 49

Income vs. education and seniorityofSeniorityeIncomYearsEducationFigure from ISLR 201318 / 49

Estimating unknown function f (cont.)The accuracy of estimating f depends onthe size of variation for the i ’s.the complexity of fitted function fˆ 4 21 0.20.40.6X0.81.0 0.0 5 Span 1/4Span 2/3SD 1.06SD 0.101.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5YI3IYI 0.00.20.40.60.81.0X19 / 49

Why do we estimate f?ITwo main reasons: prediction and inference.IIIIIMake accurate predictions of Y based on a new value of X .Which particular predictors actually affect the response?Is the relationship positive or negative?Is the relationship a simple linear one or is it more complicatedetc.?Two examples:IIInterested in predicting how much money an individual willdonate based on observations from 90,000 people on which wehave recorded over 400 different characteristics. For a givenindividual should I send out a mailing?Wish to predict median house price based on 14 variables.Understand which factors have the biggest effect on theresponse and how big the effect is. For example how muchimpact does a river view have on the house value etc.20 / 49

How Do We Estimate f ?IIUse the training data {(X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) and astatistical method to estimate f .Two groups of statistical learning methods:IParametric methods:IIIIMake some assumption about the functional form of f (e.g.MLR).Pros: estimating f estimating a set of parameters(relatively easy task). Easy to interpret the model.Cons: The form of model is too rigid. Low prediction accuracywhen f is complicated.Non-parametric methods:IIIDo not make explicit assumption about the functional form off (e.g. neural network, tree).Pros: accurately fit a wider range of possible shapes of f .Cons: Large number of observations is required to obtain anaccurate estimate of f .21 / 49

A linear regression estimateofSeniorityeIncomYearsEducationEven if the standard deviation is low, we will still get a bad answerif we use the wrong model.Figure from ISLR 201322 / 49

A thin-plate spline estimateofSeniorityeIncomYearsEducationNon-linear regression methods are more flexible and can potentiallyprovide more accurate estimates.Figure from ISLR 201323 / 49

A poor estimateofSeniorityeIncomYearsEducationNon-linear regression methods can also be too flexible and producepoor estimates for f .Figure from ISLR 201324 / 49

HighTrade-off between model flexibility and interpretabilitySubset SelectionLassoInterpretabilityLeast SquaresGeneralized Additive ModelsTreesBagging, BoostingLowSupport Vector MachinesLowHighFlexibilityFigure from ISLR 201325 / 49

1.51.00.0240.56Y8Mean Squared Error102.0122.5Training vs. test error: Example 10204060X80100251020FlexibilityLeft: LR (orange), two smoothing spline fits (blue and green).Right: training MSE (grey), testing MSE (red), minimum possibletest MSE (dash).Figure from ISLR 201326 / 49

1.51.00.0240.56Y8Mean Squared Error102.0122.5Example 2 (f is close to linear)0204060X80100251020FlexibilityLeft: LR (orange), two smoothing spline fits (blue and green).Right: training MSE (grey), testing MSE (red), minimum possibletest MSE (dash).Figure from ISLR 201327 / 49

151050 100Y10Mean Squared Error2020Example 3 (f is far from linear)0204060X80100251020FlexibilityLeft: LR (orange), two smoothing spline fits (blue and green).Right: training MSE (grey), testing MSE (red), minimum possibletest MSE (dash).Figure from ISLR 201328 / 49

Bias variance tradeoffIITwo competing forces govern the choice of learning method,i.e. bias and variance.Bias refers to the error that is introduced by modeling a reallife problem (that is usually extremely complicated) by a muchsimpler problem.IIIVariance refers to how much your estimate for f would changeby if you had a different training data set.IIIFor example, linear regression assumes that there is a linearrelationship between Y and X , which is unlikely in real life.In general, the more flexible/complex a method is the less biasit will have.Generally, the more flexible a method is the more variance.In general, the more flexible/complex a method is the less biasit has.It can be shown the expected MSE for a new Y at x new is:E [MSE(x new )] Irreducible Error Bias2 Variance29 / 49

The Bias-Variance decompositionBootstraping and BaggingBayesian Linear RegressionBias-variance tradeoff in splinesToIminimize the expected loss, there is a tradeoﬀ between the bias and the100 datasets with N 25 points eachvariance of a learning algorithm.IFlexible(e.g.,withmany24parameterslow regularization)Fitmodelsa modelGaussianor basisfunctions have low bias andhigh variance.I Use regularized least squares with varying lambdaRigid models (e.g., few parameters or large regularization) have high bias andlow variance.11ln λ 2.6t1ln λ 0.31t000 1 1 10x101x11tx110x10 10xt0 101t0ln λ 2.4t 10x130 / 49

51.01.0101.51.52.02.5Bias, variance and MSE curves in example 1-32510Flexibility20251020FlexibilitySquared bias (blue), variance (orange) and test MSE (red) forexample 1-3.Vertical dotted line is the optimal flexibility level with theminimum test MSE.Figure from ISLR 201331 / 49

The classification settingIFor a classification problem we can use the error rate i.e.Error rate nXi 1III (yi 6 ŷi )/nThe error rate represents the misclassifications rate.The Bayes error rate refers to the lowest possible error ratethat could be achieved if somehow we knew exactly what the“true” probability distribution of the data looked like.By the Bayes rule:fˆ(x) arg max Pr (y k X x).kIDecision boundary between class k and l is determined by theequation:Pr (y k X x) Pr (y l X x).IIn real life problems the Bayes error rate can’t be calculatedexactly.32 / 49

K-Nearest Neighbors (KNN)Ik Nearest Neighbors is a flexible approach to estimate theBayes classifier.IFor any given X we find the k closest neighbors to X in thetraining data, and examine their corresponding Y .IIf the majority of the Y ’s are orange we predict orangeotherwise guess blue.IThe smaller that k is the more flexible the method will be.33 / 49

KNN example with k 3ooooooooooooooooooooooooFigure from ISLR 201334 / 49

KNN with k 1 and k 100KNN: K 1oooooooooooooooo oo o ooooo o ooooo oooo o ooooo oooooooooooo o oooooo oooo oo oo oo ooo ooo oooooo ooo o ooo oooo ooooooooooo o ooooo o oooooo o ooo oo o oo ooo ooo o o ooo oo oo o oooooo o ooooooooo oooooooooooo oo o o oooo ooooooooooooo oooo ooooKNN: K 100ooooooooooooooo oo o ooooo o ooooo oooo o ooooo oooooooooooo o oooooo oooo oo oo oo ooo ooo oooooo ooo o ooo oooo ooooooooooo o ooooo o oooooo o ooo oo o oo ooo ooo o o ooo oo oo o oooooo o ooooooooo oooooooooooo oo o o oooo ooooooooooooo oooo oooooDash line is the class boudary from the Bayes classifier.k 1 overfits (too complex) and k 100 underfits (too simple).Figure from ISLR 201335 / 49

A good choice of k(Figure from ISLR 2013)KNN: K 10oo oooooo oo ooooo oo oooooo oooooo oo ooo o ooooo oo oooooo o o ooooooooo oooo oo oo oo oo o oo oo oooooo o oooo oooo o o o ooo o oo o o ooooooooooo oooo o ooo oo o oo ooooo oooo o o oo o oo ooooooo ooooo oooooooo oo o ooo ooo oo oooooooo ooooooooo oooo oooX2oX1The class boundary for the knn with k 10 is very similar to theone from Bayes classifier.36 / 49

0.100.05Error Rate0.150.20Training vs. test error rates in knn example0.00Training ErrorsTest Errors0.010.020.050.100.200.501.001/KTraining error rates keep going down as k decreases.Test error rate at first decreases but then starts to increase.Figure from ISLR 201337 / 49

Prediction ErrorA fundamental pictureHigh BiasLow VarianceLow BiasHigh VarianceTest SampleTraining SampleLowHighModel ComplexityFigure from EOSL 2001.Figure 7.1: Behavior of test sample and training sam-38 / 49

A cautionary noteIGeorge Box, a famous statistician and son-in-law of R.A.Fisher, once said:“All models are wrong, but some are useful.”IIn practice, there is really NO true model but a good model.A good model should achieve at least one of the following:IIIIIan interpretable model that can be explained by some knownfacts or knowledge;reveals some unknown truth or relationship among thevariables or observations;a model with accurate prediction on new samples.The optimal model depends on:IIIIthethethethepurpose of the study;complexity of the underlying mechanism;quality of the data and signal-noise-ratio;sample size.39 / 49

Simulation study II Data: 500 samples with 25 input variables and 1 numeric response Y .II P2Data generating mechanism: yi 15j 1 xi i where i N(0, 3 ).Input variables: X (X1 , . . . , X25 ) MVN(0, Σ) whereρ(Xi , Xj ) 0.5 i 6 j and 1 o/w.library(MASS) #mvrnorm is in MASS librarymu - rep(0,25)Sigma - matrix(0.5,25,25) diag(.5,25)n - 500set.seed(1)x - mvrnorm(n,mu,Sigma)y - as.vector(x%*%c(rep(1,15),rep(0,10))) rnorm(n,sd 3)data1 - data.frame(x,y)[1:50,]; data2 - data.frame(x,y)I The best subset selection is applied here using regsubsets function inlibrary(leaps) in R.I Two groups of models are generated using the first 50 obs (data1). andfull data (n 500, data2)40 / 49

Simulation study I (cont.) Invmax: the maximum size of subsets to examine.Inbest: the number of subsets of each size to record.IThere are some other useful option. For details, type?regsubsets in R.library(leaps)sout1 - summary(regsubsets(y ., data data1, nvmax 15,nbest 5))res1 - cbind(apply(sout1 which[,-1],1,sum),Cp sout1 cp,bic sout1 bic)sout2 - summary(regsubsets(y ., data data2, nvmax 25,nbest 5))res2 - cbind(apply(sout2 which[,-1],1,sum),Cp sout2 cp,bic sout2 bic)par(mfrow c(2,2))plot(res1[,1],res1[,2],xlim c(1,15),ylim c(0,50),xlab "Model size",ylab "Mallow Cp")plot(res1[,1],res1[,3],xlim c(1,15),ylim range(res1[,3]),xlab "Model size",ylab "BIC")plot(res2[,1],res2[,2],xlim c(1,25),ylim c(0,200),xlab "Model size",ylab "Mallow Cp")plot(res2[,1],res2[,3],xlim c(1,25),ylim range(res2[,3]),xlab "Model size",ylab "BIC")41 / 49

Sample size effect 40 3050 20n 50 40 BIC 70 046 8 10 12 142 80 2 6020 5030 10Mallow Cpn 50 4 6Model size 400 600BIC 1000 50 800150100 1514 0Mallow Cp Model size12n 500 10 10 2025 1200200n 5005 Model size 8 5 10 15 20 25Model size42 / 49

Noise effectIWe set two levels of standard deviation on i : 1 and 6 withSNR 122 and 3.4, respectively.IWe use the BIC (common criterion to select models) to selectthe optimal model size (highlighted by red vertical line).IOthers are kept the same as previous (n 500).SD 6 300 400 BIC 1500 500 600 2000 51015Model size2025 700BIC 1000 500SD 1 5 10152025Model size43 / 49

Simulation study II: bias-variance tradeoffyi 2sin(1.5xi ) xi i ,where i N(0, 1)10observation.8I Data: I Training set: dat has 20 I Fit the data using Y 4 2fine grid and the truefunction values withoutnoise. 6polynomial regressions.I dat2 has X values on a n - 20set.seed(1)dat -data.frame(x runif(n, 0, 9.5))dat y -with(dat,2*sin(1.5*x) x rnorm(n,sd 1))dat2 -data.frame(x seq(from 1,to 9,le 81))dat2 y -with(dat2,2*sin(1.5*x) x)plot(dat x,dat y, xlab "X", ylab "Y")lines(dat2 x,dat2 y,col "red",lwd 2) 0 2468X44 / 49

Fitting on various orders of polynomial regressionsI Fit the data using8I Predict on a fine grid6of X in dat2. 4Ypred - matrix(0,length(dat2 x),10)for (i in 1:10){poly.fit - lm(y poly(x,i,raw T),dat)pred[,i] -predict(poly.fit,dat2)}matplot(dat2 x, pred, xlab "X", ylab "Y",xlim c(0,9.5), ylim range(c(dat y,pred)),lty 1:10,lwd 2,type "l",col rainbow(10, start 3/6, end 4/6))points(dat x, dat y)lines(dat2 x, dat2 y, col "red", lwd 2) 2 0 Order: 1Order: 2Order: 3Order: 4Order: 5Order: 6Order: 7Order: 8Order: 9Order: 1010polynomialregressions from order1 to 10.02468X45 / 49

Repeat 50 times on randomly generately Y246810 2024Y681086Y420 2 2024Y681012Order 1012Order 512Order 124682468XXBias 2.006; SD 0.284Bias 1.319; SD 0.322Bias 0.064; SD 1.013012X340.40.20.00.10.20.0 rue valueMean of estimates0.61.51.20.7X 1012X34 101234X46 / 49

Remarks on previous figureI Variance: how much ŷvaries from one trainingset D to another.I Bias: the differencebetween the true value atX x and expectedvalue of ŷ X x (average of datasets).I Model too “simple” does not fit data well (abiased solution).I Model too “complex” small change of datamakes a big change on ŷ(a high variance solution). iter - 50pred - list()for (it in 1:iter){set.seed(it)dat y - 2*sin(1.5*dat x) dat x rnorm(n,sd 1)pred[[it]] - matrix(0,length(dat2 x),10)for (i in 1:10){pred[[it]][,i] - predict(lm(y poly(x,i,raw T),dat),dat2)}}par(mfcol c(2,3))plot(dat2 x,pred[[1]][,1],xlab "X",ylab "Y",type "n")for (i in 1:iter){lines(dat2 x,pred[[i]][,1])}lines(dat2 x,dat2 y,col "red",lwd 2)segments(3,-2,3,6,lwd 2,col rgb(0,0,1,alpha 0.5))title("Order 1")plot(density(pred.2[,1],bw 0.1),main "",xlab "X")lines(rep(dat2 y[ind],2),c(0,0.2),col "blue")lines(rep(mean(pred.2[,1]),2),c(0,0.2),col "red",lty 2)47 / 49

5.02.0MSE1.00.5mse - matrix(0,iter,10)FUN1 - function(x) mean((x-dat2 y) 2)for (it in 1:iter){mse[it,] - apply(pred[[it]],2,FUN1)}plot(1:10,mse[1,],log "y",ylab "MSE",xlab "Polynomial order",xlim c(1,10),ylim range(mse),type "n")for (it in 1:iter){lines(1:10,mse[it,]col "blue",lwd 0.3)}lines(1:10,apply(mse,2,mean),col "red")0.2 0.1I Curves on the background arethe MSE for each sample againtpolynomial order.I Solid red line is the averageMSE among 50 samples.I Left: low variance but high bias Riight: high variance lowbias.I Optimal order is around 6 (truefunction has 4 reflection pts).10.0MSE curves among 50 repetitions246810Polynomial order48 / 49

Bias-variance tradeoff in MSE3.02.50.51.01.5bias2 - vari - rep(0,10)for (i in 1:10){tmp1 - matrix(0,length(dat2 x),iter)for (it in 1:iter){tmp1[,it] - pred[[it]][,i]}tmp2 - apply(tmp1,1,mean)#bias2[i]: mean bias 2 for ith orderbias2[i] - mean((dat2 y-tmp2) 2)#tmp3: variance of est. on grid for ith ordertmp3 - apply(tmp1,1,var)vari[i] - mean(tmp3)}plot(1:10,apply(mse,2,mean),xlab "Polynomial order",ylab "",col "blue",ylim range(c(bias2,vari)),type "l",lwd 2)lines(1:10,bias2,col "red",lwd 2,lty 2)lines(1:10,vari,col "orange",lwd 2,lty 4)0.0 MSEBias2Variance2.0I Since we know the true function,here MSE bias2 Variance.I Bias is estimated by using theaverage over 50 replications asE (fˆ).I Variance is estimated by usingthe variance of fˆ over 50replications.246810Polynomial order49 / 49

What is statistical learning I Statistical learning is the science of learning from the data using statistical methods. I Predict the price of a stock in 6 months from now, on the basis of company performance measures and economic data. I Predict whether a patient, hospitalized due to a heart attack, will have a second attack based on patient's demographic, diet

Related Documents:

Statistical Learning Theory

machine learning Supervised & unsupervised learning Models & algorithms: linear regression, SVM, neural nets, -Statistical learning theory Theoretical foundation of statistical machine learning -Hands-on practice Advanced topics: sparse modeling, semi-supervised learning, transfer learning, Statistical learning theory:

15 Views

1y ago

Craft Council of Newfoundland and Labrador - Webflow

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

310 Views

2y ago

ASolutionManualandNotesfor: The Elements of Statistical ...

The Elements of Statistical Learning byJeromeFriedman,TrevorHastie, andRobertTibshirani John L. Weatherwax David Epstein† 1 March 2021 Introduction The Elements of Statistical Learning is an inﬂuential and widely studied book in the ﬁelds of machine learning, statistical inference, and pattern recognition. It is a standard recom-

38 Views

3y ago

Introduction to Statistical Learning - WordPress.com

if Y (x) 0:5, then? Olivier Roustant & Laurent Carraro (EMSE) Introduction to Statistical Learning 2016/09 17 / 39. Part 2 : A guiding example Linear frontier : classiﬁcation rate 73.5 % Olivier Roustant & Laurent Carraro (EMSE) Introduction to Statistical Learning 2016/09 18 / 39.

7 Views

1y ago

Statistical Physics - Heidelberg University

agree with Josef Honerkamp who in his book Statistical Physics notes that statistical physics is much more than statistical mechanics. A similar notion is expressed by James Sethna in his book Entropy, Order Parameters, and Complexity. Indeed statistical physics teaches us how to think about

63 Views

2y ago

Module 5: Statistical Analysis - Vermont EPSCoR

Module 5: Statistical Analysis. Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module revi

43 Views

2y ago

Lesson 1: Posing Statistical Questions - EngageNY

Lesson 1: Posing Statistical Questions Student Outcomes Students distinguish between statistical questions and those that are not statistical. Students formulate a statistical question and explain what data could be collected to answer the question. Students distingui

22 Views

2y ago

WRITING A SCIENTIFIC REPORT - University of Sheffield

APS 240 Interlude Ð Writing Scientific Reports Page 5 subspecies of an organism (e.g. Calopteryx splendens xanthostoma ) then the sub-species name (xanthostoma ) is formatted the same way as the species name. In the passage above you will notice that the name of the damselfly is followed by a name: ÔLinnaeusÕ. This is the authority, the name of the taxonomist responsible for naming the .

109 Views

3y ago

Recent Views

Finance Management for Schools Bromcom eFinance, powered .

eFinance. The Bromcom Financial Accounting System (FAS) is a purpose designed configuration of one of the world's leading financial management solutions now available to UK maintained schools, academies and multi academy trusts (MATs). Known as eFinance, at its core is a suite of modules from Unit4 Business World.

1y ago

108 Views

eFinance Budget Entry Schools

eFinance Plus Entry The boom poron of the "Expendi ture Budget Process" window will be accessible on your screen. You are now ready to enter your budget for next ﬁscal year. Enter the amount you want to allocate for your next ﬁscal year's budget in the Requested

1y ago

132 Views

Siebel eFinance for Teller Connector to IBM WebSphere .

12 Siebel eFinance for Teller Connector to IBM WebSphere Business Component Composer Guide Version 7.0, Rev. H Siebel Teller Architecture The Siebel Connector for Teller extends the functionality of the Siebel Connector for IFX XML to provide Teller-specific data exchange between Siebel and other systems.

1y ago

105 Views

Siebel eFinance ガイドバージョン6.0

siebelﬁ ebusiness applications siebel efinance ガイド siebel 2000 バージョン6.0.2 2000 年7 月 6jpa1-fb00-06020 sfsbank.book 1 ページ 2001年5月29日火曜日午後5時42分

1y ago

100 Views

1 2 4 5 7 8 9 10

The eFinance Plus Accounting, Human Resources and Payroll System are supported by D&N. This system is an online interactive package designed to handle all phases of K-12 school business. ESU#3/D&N is supporting a new time clocking system called Time Clock Plus. This clocking system will integrate with eFinance Plus as well

1y ago

110 Views

IHRE ONLINE FINANZIERUNG: eFINANCE

eFinance bietet Ihnen die Möglichkeit, Finanzierungs-produkte ab sofort ganz einfach online zu beantragen. In einem transparenten und strukturierten Prozess können Sie die notwendigen Dokumente sicher übermitteln, mit uns verhandeln, und auch elektronisch unterzeichnen. Außerdem können Sie mit Ihrem Kunden-

1y ago

150 Views

Relatório Anual 2014

Prêmio efinance 2014 O Sicredi foi o vencedor da categoria Plataforma de Canais do XIII Prêmio efinance com o case Plataforma Multicanal. A Plataforma Multicanal foi desenvolvida para renovar a tecnologia utilizada nos canais de relacionamento da instituição financeira cooperativa com os associados. Julho

1y ago

108 Views

eFinance Travel Voucher Guide - National Defense University

filling out the travel voucher (CONUS-CONUS). - If it is your current address, check the box. 5 America’s Airmen Dependents - Add all dependents. - If the individual will be claimed on the voucher, click “auto-claim this dependent” before adding them. 6 America’s Airmen

2y ago

107 Views

E-Finance in the Philippines: Status and Prospects for Digital .

the role of digital technology in financial inclusion has not been studied in detail. There has been very limited information available in the existing literature that examines the role of efinance in achieving the objective- of inclusive growth. This paper is an attempt to study the contribution of technology towards financial inclusion in

1y ago

107 Views

Wiener Processes and Ito's Lemma - efinance .cn

Categorization of Stochastic Processes Discrete time; discrete variable Random walk: if can only take on discrete values Discrete time; continuous variable

1y ago

105 Views

AIC eServices for Financing Schemes (eFASS) Navigation Guide

Schemes (eFASS) platform at https://eFinance.aic.sg For detailed steps, refer to page 3 of this navigation guide. Yes, you can apply on behalf of someone in your family.

1y ago

117 Views

2016-2017 Financial Services Guidelines

Receiving POs in eFinance 41 Staff Travel 42 Student Travel 43 Accounts Payable Forms and Instructions 43 TRAVEL PROCEDURES GUIDELINES 45 Required Documentation and Steps 46 Step 1 - Conference Approval 46 Step 2 - Conference Requisition Request 46 . 4 Step 3 - Conference Purchase Order/Payment 46 .

1y ago

107 Views

Introducing the New and Revised Data Points in HMDA

added two e numerations ( "cash -out r efinance" an d " other p urpose") t o Loan P urpose, an d s plit the "non-owner o ccupied" category o f Occupancy Type i nto " se cond r esidence" a nd " in vestment propert y." In ad dition, un der t he 20 15 H MDA R ule, ap plicants h ave t he o ption t o s elf -identify

1y ago

99 Views

Data Point: 2018 Mortgage Market Activity and Trends

The number of r efinance o riginations declined from 2.5 million in 2017 to 1.9 million in 2018. The number of reported home improvement loans declined from 549 ,000 in 2017 to 183,000 in 2018 , a drop that resulted primarily from a change in reporting requirements that excluded unsecured home improvement loans . 5

1y ago

102 Views

Ankeny Community Schools 306 Sw School St. Fixed Asset Inventory and .

reconciliation. ACSD is currently using the Fixed Assets Module of eFinance Plus software to track assets. Vendor will perform all labor to conduct a comprehensive inventory at ACSD site locations. During the inventory process, all of the following information will be captured for each item Asset Identification Information

1y ago

126 Views

Introduction To Statistical Learning

It looks like you're using an ad-blocker