Data Mining Trevor Hastie, Stanford University 1 - Donuts Inc.

1y ago

3 Views

2 Downloads

595.51 KB

46 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Oscar Steel

Report this link

Download PDF

Transcription

Data Mining Trevor Hastie, Stanford University Modern Trends in Data Mining Trevor Hastie Stanford University November, 2006. 1

Data Mining Trevor Hastie, Stanford University Datamining for Prediction We have a collection of data pertaining to our business, industry, production process, monitoring device, etc. Often the goals of data-mining are vague, such as “look for patterns in the data” — not too helpful. In many cases a “response” or “outcome” can be identiﬁed as a good and useful target for prediction. Accurate prediction of this target can help the company make better decisions, and save a lot of money. Data-mining is particularly good at building such prediction models — an area known as ”supervised learning”. 2

Data Mining Trevor Hastie, Stanford University Example: Credit Risk Assessment Customers apply to a bank for a loan or credit card. They supply the bank with information such as age, income, employment history, education, bank accounts, existing debts, etc. The bank does further background checks to establish credit history of customer. Based on this information, the bank must decide whether to make the loan or issue the credit card. 3

Data Mining Trevor Hastie, Stanford University Example continued: Credit Risk Assessment The bank has a large database of existing and past customers. Some of these defaulted on loans, others frequently made late payments etc. An outcome variable “Status” is deﬁned, taking value “good” or “default”. Each of the past customers is scored with a value for status. Background information is available for all the past customers. Using supervised learning techniques, we can build a risk prediction model that takes as input the background information, and outputs a risk estimate (probability of default) for a prospective customer. The California based company Fair-Isaac uses a generalized additive model boosting methods in the construction of their credit risk scores. 4

Data Mining Trevor Hastie, Stanford University Example: Churn Prediction When a customer switches to another provider, we call this “churn”. Examples are cell-phone service and credit card providers. Based on customer information and usage patterns, we can predict – the probability of churn – the retention probability (as a function of time) This information can be used to evaluate – prospective customers to decide on acceptance – present customers to decide on intervention strategy Risk assessment and survival models are used by US cell-phone companies such as AT&T to manage churn. 5

Data Mining Trevor Hastie, Stanford University Grand Prize: one million dollars, if beat Netﬂix’s RMSE by 10%. After 2 months, the leaders are at 4.9%. 6

Data Mining Trevor Hastie, Stanford University Netﬂix Challenge Netﬂix users rate movies from 1-5. Based on a history of ratings, predict the rating a viewer will give to a new movie. Training data: sparse 400K (users) by 18K (movies) rating matrix, with 98.7% missing. About 100M movie/rater pairs. Quiz set of about 1.4M movie/viewer pairs, for which predictions of ratings are required (Netﬂix has held them back) Probe set of about 1.4 million movie/rater pairs similar in composition to the quiz set, for which the ratings are known. 7

Data Mining Trevor Hastie, Stanford University The Supervised Learning Problem Starting point: Outcome measurement Y (also called dependent variable, response, target, output) Vector of p predictor measurements X (also called inputs, regressors, covariates, features, independent variables) In the regression problem, Y is quantitative (e.g price, blood pressure, rating) In classiﬁcation, Y takes values in a ﬁnite, unordered set (default yes/no, churn/retain, spam/email) We have training data (x1 , y1 ), . . . , (xN , yN ). These are observations (examples, instances) of these measurements. 8

Data Mining Trevor Hastie, Stanford University Objectives On the basis of the training data we would like to: Accurately predict unseen test cases for which we know X but do not know Y . In the case of classiﬁcation, predict the probability of an outcome. Understand which inputs aﬀect the outcome, and how. Assess the quality of our predictions and inferences. 9

Data Mining Trevor Hastie, Stanford University More Examples Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements Determine whether an incoming email is “spam”, based on frequencies of key words in the message Identify the numbers in a handwritten zip code, from a digitized image Estimate the probability that an insurance claim is fraudulent, based on client demographics, client history, and the amount and nature of the claim. Predict the type of cancer in a tissue sample using DNA expression values 10

Data Mining Trevor Hastie, Stanford University Email or Spam? data from 4601 emails sent to an individual (named George, at HP labs, before 2000). Each is labeled as “spam” or “email”. goal: build a customized spam ﬁlter. input features: relative frequencies of 57 of the most commonly occurring words and punctuation marks in these email messages. george you hp free ! edu remove spam 0.00 2.26 0.02 0.52 0.51 0.01 0.28 email 1.27 1.27 0.90 0.07 0.11 0.29 0.01 Average percentage of words or characters in an email message equal to the indicated word or character. We have chosen the words and characters showing the largest diﬀerence between spam and email. 11

Data Mining Trevor Hastie, Stanford University Handwritten Digit Identiﬁcation A sample of segmented and normalized handwritten digits, scanned from zip-codes on envelopes. Each image has 16 16 pixels of greyscale values ranging from 0 255. 12

Data Mining Trevor Hastie, Stanford University BREAST RENAL RENAL RENAL PROSTATE NSCLC OVARIAN CNS BREAST NSCLC MCF7A-repro OVARIAN OVARIAN COLON BREAST NSCLC LEUKEMIA MELANOMA CNS COLON PROSTATE LEUKEMIA NSCLC NSCLC CNS CNS MCF7D-repro MELANOMA BREAST RENAL RENAL OVARIAN LEUKEMIA BREAST MELANOMA LEUKEMIA RENAL BREAST MELANOMA LEUKEMIA MELANOMA COLON NSCLC COLON OVARIAN NSCLC K562A-repro LEUKEMIA COLON NSCLC OVARIAN K562B-repro CNS RENAL COLON MELANOMA NSCLC RENAL UNKNOWN MELANOMA MELANOMA BREAST COLON RENAL SIDW299104 SIDW380102 SID73161 GNAL H.sapiensmRNA SID325394 RASGTPASE SID207172 ESTs SIDW377402 HumanmRNA SIDW469884 ESTs SID471915 MYBPROTO ESTsChr.1 SID377451 DNAPOLYMER SID375812 SIDW31489 SID167117 SIDW470459 SIDW487261 Homosapiens SIDW376586 Chr MITOCHONDRIAL60 SID47116 ESTsChr.6 SIDW296310 SID488017 SID305167 ESTsChr.3 SID127504 SID289414 PTPRC SIDW298203 SIDW310141 SIDW376928 ESTsCh31 SID114241 SID377419 SID297117 SIDW201620 SIDW279664 SIDW510534 HLACLASSI SIDW203464 SID239012 SIDW205716 SIDW376776 HYPOTHETICAL WASWiskott SIDW321854 ESTsChr.15 SIDW376394 SID280066 ESTsChr.5 SIDW488221 SID46536 SIDW257915 ESTsChr.2 SIDW322806 SID200394 ESTsChr.15 SID284853 SID485148 SID297905 ESTs SIDW486740 SMALLNUC ESTs SIDW366311 SIDW357197 SID52979 ESTs SID43609 SIDW416621 ERLUMEN TUPLE1TUP1 SIDW428642 SID381079 SIDW298052 SIDW417270 SIDW362471 ESTsChr.15 SIDW321925 SID380265 SIDW308182 SID381508 SID377133 SIDW365099 ESTsChr.10 SIDW325120 SID360097 SID375990 SIDW128368 SID301902 SID31984 SID42354 Microarray Cancer Data Expression matrix of 6830 genes (rows) and 64 samples (columns), for the human tumor data (100 randomly chosen rows shown). The display is a heat map, ranging from bright green (under expressed) to bright red (over expressed). Goal: predict cancer class based on expression values. 13

Data Mining Trevor Hastie, Stanford University Shameless self-promotion Most of the topics in this lecture are covered in our 2001 book, and all will be covered in the 2nd edition (if it ever gets ﬁnished). The book blends traditional linear methods with contemporary nonparametric methods, and many between the two. 14

Data Mining Trevor Hastie, Stanford University Ideal Predictions For a quantitative output Y , the best prediction we can make when the input vector X x is f (x) Ave(Y X x) – This is the conditional expectation — deliver the Y -average of all those examples having X x. – This is best if we measure errors by average squared error Ave(Y f (X))2 . For a qualitative output Y taking values 1, 2,. . . , M , compute – Pr(Y m X x) for each value of m. This is the conditional probability of class m at X x. – Classify C(x) j if Pr(Y j X x) is the largest — the majority vote classiﬁer. 15

Data Mining Trevor Hastie, Stanford University Implementation with Training Data The ideal prediction formulas suggest a data implementation. To predict at X x, gather all the training pairs (xi , yi ) having xi x, then: For regression, use the mean of their yi to estimate f (x) Ave(Y X x) For classiﬁcation, compute the relative proportions of each class among these yi , to estimate Pr(Y m X x); Classify the new observation by majority vote. Problem: in the training data, there may be NO observations having xi x. 16

Data Mining Trevor Hastie, Stanford University Nearest Neighbor Averaging Estimate Ave(Y X x) by Averaging those yi whose xi are in a neighborhood of x. E.g. deﬁne the neighborhood to be the set of k observations having values xi closest to x in euclidean distance xi x . For classiﬁcation, compute the class proportions among these k closest points. Nearest neighbor methods often outperform all other methods — about one in three times — especially for classiﬁcation. 17

Data Mining Trevor Hastie, Stanford University 18 1.5 Kernel smoothing O O O O 1.0 0.5 O O O O O 0.0 O O O O OO O O O O O O O O O O OO OO O O O O O Smooth version of nearestneighbor averaging O O O O O OO O O OO OO O O OO O O O O O O O O O O O OOO O O O O O O O O O O O -0.5 O O -1.5 -1.0 O OO O O O O OO OO OO OO O O O O O 0.0 0.2 0.4 0.6 0.8 1.0 At each point x, the function f (x) Y (Y X x) is estimated by the weighted average of the y’s. The weights die down smoothly with distance from the target point x (indicated by shaded orange region).

Data Mining Trevor Hastie, Stanford University Structured Models When we have a lot of predictor variables, NN methods often fail because of the curse of dimensionality: It is hard to ﬁnd nearby points in high dimensions! Near-neighbor models oﬀer little interpretation. We can overcome these problems by assuming some structure for the regression function Ave(Y X x) or the probability function Pr(Y k X x). Typical structural assumptions: – Linear Models – Additive Models – Low-order interaction models – Restrict attention to a subset of predictors – . . . and many more 19

Data Mining Trevor Hastie, Stanford University Linear Models Linear models assume Ave(Y X x) β0 β1 X1 β2 X2 . . . βp Xp For two class classiﬁcation problems, linear logistic regression has the form log Pr(Y 1 X x) β0 β1 X1 β2 X2 . . . βp Xp Pr(Y 1 X x) This translates to eβ0 β1 X1 β2 X2 . βp Xp Pr(Y 1 X x) 1 eβ0 β1 X1 β2 X2 . βp Xp Chapters 3 and 4 of deal with linear models. 20

Data Mining Trevor Hastie, Stanford University Linear Model Complexity Control With many inputs, linear regression can overﬁt the training data, leading to poor predictions on future data. Two general remedies are available: Variable selection: reduce the number of inputs in the model. For example, stepwise selection or best subset selection. Regularization: leave all the variables in the model, but when ﬁtting the model, restrict their coeﬃcients. p 2 – Ridge: j 1 βj s. All the coeﬃcients are non-zero, but are shrunk toward zero (and each other). p – Lasso: j 1 βj s. Some coeﬃcients drop out the model, others are shrink toward zero. 21

Data Mining Trevor Hastie, Stanford University 22 60 40 3 4 5 6 7 8 0 20 Residual Sum-of-Squares 80 100 Best Subset Selection 0 1 2 Subset Size s Each point corresponds to a linear model involving a subset of the variables, and shows the residual sum-of-squares on the training data. The red models are the candidates, and we need to choose s.

Data Mining Trevor Hastie, Stanford University Ridge Lasso 0.4 0.0 0.2 svi lweight pgg45 lbph gleason 0.6 lcavol age -0.2 0.4 0.2 0.0 Coeﬃcients β̂(s) 0.6 -0.2 Coeﬃcients β̂(s) lcavol 23 lcp 0 2 4 6 Shrinkage Factor s 8 0.0 0.2 0.4 0.6 svi lweight pgg45 lbph gleason age lcp 0.8 Shrinkage Factor s Both ridge and lasso coeﬃcients paths can be computed very eﬃciently for all values of s. 1.0

Data Mining Trevor Hastie, Stanford University Overﬁtting and Model Assessment In all cases above, the larger s, the better we will ﬁt the training data. Often we overﬁt the training data. Overﬁt models can perform poorly on test data (high variance). Underﬁt models can perform poorly on test data (high bias). Model assessment aims to 1. Choose a value for a tuning parameter s for a technique. 2. Estimate the future prediction ability of the chosen model. For both of these purposes, the best approach is to evaluate the procedure on an independent test set, if one is available. If possible one should use diﬀerent test data for (1) and (2) above: a validation set for (1) and a test set for (2) 24

Data Mining Trevor Hastie, Stanford University K-Fold Cross-Validation Primarily a method for estimating a tuning parameter s when data is scarce; we illustrate for the regularized linear regression models. Divide the data into K roughly equal parts (5 or 10) 1 Train 2 Train 3 Test 4 Train 5 Train for each k 1, 2, . . . K, ﬁt the model with parameter s to the other K 1 parts, giving β̂ k (s) and compute its error in predicting the kth part: Ek (λ) i kth part (yi xi β̂ k (s))2 . This gives the overall cross-validation error K 1 CV (s) K k 1 Ek (s) do this for many values of s and choose the value of s that makes CV (s) smallest. 25

Data Mining Trevor Hastie, Stanford University 26 6000 10-fold CV error curve using lasso on some diabetes data (64 inputs, 442 samples). 5500 Thick curve is CV error curve 4500 5000 Shaded region indicates standard error of CV estimate. 3500 4000 Curve shows eﬀect of overﬁtting — errors start to increase above s 0.2. 3000 CV Error Cross-Validation Error Curve 0.0 0.2 0.4 0.6 0.8 Tuning Parameter s 1.0 This shows a trade-oﬀ between bias and variance.

Data Mining Trevor Hastie, Stanford University Modern Structured Models in Data Mining The following is a list of some of the more important and currently popular prediction models in data mining. Linear Models (often heavily regularized) Generalized Additive Models Neural Networks Trees, Random Forests and Boosted Tree Models — hot! Support Vector and Kernel Machines — hot! 27

Data Mining Trevor Hastie, Stanford University Generalized Additive Models Allow a compromise between linear models and more ﬂexible local models (kernel estimates) when there are a many inputs X (X1 , X2 , . . . , Xp ). Additive models for regression: Ave(Y X x) α0 f1 (x1 ) f2 (x2 ) . . . fp (xp ). Additive models for classiﬁcation: Pr(Y 1 X x) log α0 f1 (x1 ) f2 (x2 ) . . . fp (xp ). Pr(Y 1 X x) Each of the functions fj (xj ) (one for each input variable), can be a smooth function (ala kernel estimate), linear, or omitted. 28

5 0 fˆ(internet) 5 0 fˆ(remove) 5 GAM ﬁt to SPAM data -5 -5 -5 0 fˆ(over) 5 0 -5 fˆ(our) 29 10 Trevor Hastie, Stanford University 10 Data Mining 8 0 1 2 0 2 4 6 0 2 4 6 8 10 internet 5 fˆ(hp) 0 0 10 remove 0 2 4 6 8 10 0 2 4 6 0 5 business 10 15 20 0 5 10 hp hpl Overall error rate 5.3% . 0 -5 fˆ(edu) -10 0 -10 -5 fˆ(re) 5 0 fˆ(1999) -5 0 -5 -10 fˆ(george) 5 5 free 0 10 20 30 0 2 4 6 0 5 1999 10 15 20 0 re 5 10 15 edu 5 0 fˆ(CAPTOT) 5 0 -5 -5 0 10 20 ch! 30 0 1 2 3 4 ch 5 6 0 2000 6000 CAPMAX 10000 Functions can be reparametrized (e.g. log terms, quadratic, step-functions), and then ﬁt by linear model. Produces a prediction per email Pr(SPAM X x) -5 fˆ(CAPMAX) 5 0 fˆ(ch ) 0 -5 fˆ(ch!) 5 10 10 george Shown are the most important predictors. Many show nonlinear behavior. -5 -10 0 fˆ(business) 10 5 0 -5 fˆ(free) 3 over -5 6 fˆ(hpl) 4 our -10 2 -5 0 0 5000 10000 15000 CAPTOT

Data Mining Trevor Hastie, Stanford University 30 Neural Networks Y1 Z1 Y2 Z2 X1 Output Layer Z3 X2 Z4 X3 Hidden Layer Input Layer Single (Hidden) Layer Perceptron Like a complex regression or logistic regression model — more ﬂexible, but less interpretable — a “black box”. Hidden units Z1 , Z2 , . . . , Zm (4 here): Zj σ(α0j αjT X) σ(Z) eZ /(1 eZ ) is the logistic sigmoid activation function. Output is a linear regression or logistic regression model in the Zj . Complexity controlled by m, ridge regularization, and early stopping of the backpropogation algorithm for ﬁtting the neural network.

Data Mining Trevor Hastie, Stanford University Support Vector Machines Maximize the gap (margin) between the two classes on the training data. Decision Boundary If not separable Margin Margin – enlarge the feature space via basis expansions (e.g. polynomials). – use a “soft” margin (allow limited overlap). Solution depends on a small number of points (“support vectors”) — 3 here. 31

Data Mining Trevor Hastie, Stanford University Support Vector Machines xT β β0 0 ξ4 ξ3 ξ1 ξ2 ξ5 Maximize the soft margin subject to a bound on the total overlap: i ξi B. With yi { 1, 1}, becomes convex-optimization problem Soft Margin min β s.t. Soft Margin yi (xTi β β0 ) 1 ξi i N i 1 ξi B 32

Data Mining Trevor Hastie, Stanford University 33 Properties of SVMs Primarily used for classiﬁcation problems. Builds a linear classiﬁer f (X) β0 β1 X1 β2 X2 . . . βp Xp . If f (X) 0, classify as 1, else if f (X) 0, classify as -1. Generalizations use kernels (“radial basis functions”): f (X) α0 N αi K(X, xi ) i 1 γ X xi 2 – K is a symmetric function, e.g. K(X, xi ) e and each xi is one of the samples (vectors) – Many of the αi 0; the rest are “support points”. Extensions to regression, logistic regression, PCA, . . . Well developed mathematics — function estimation in Reproducing Kernel Hilbert Spaces. ,

Data Mining Trevor Hastie, Stanford University 34 3.0 SVM via Loss Penalty 2.5 Binomial Log-likelihood Support Vector N 1.5 2.0 λ [1 yi f (xi )] β 2 min β0 , β 2 i 1 0.5 1.0 This hinge loss criterion is equivalent to the SVM, with λ monotone in B. Compare with 0.0 Loss With f (x) xT β β0 and yi { 1, 1}, consider -3 -2 -1 0 1 yf (x) (margin) 2 3 λ min log 1 e yi f (xi ) β 2 β0 , β 2 i 1 N This is binomial deviance loss, and the solution is “ridged” linear logistic regression.

Data Mining Trevor Hastie, Stanford University 35 Path algorithms for the SVM N The two-class SVM classiﬁer f (X) α0 i 1 αi K(X, xi )yi can be seen to have a quadratic penalty and piecewise-linear loss. As the cost parameter C is varied, the Lagrange multipliers αi change piecewise-linearly. This allows the entire regularization path to be traced exactly. The active set is determined by the points exactly on the margin. 12 points, 6 per class, Separated * Mixture Data Radial Kernel Gamma 1.0 10 *9 *3 * 11 * 7 * 12 *6 *5 *1 *4 *8 *2 Step: 17 Error: 0 Elbow Size: 2 Loss: 0 ** * * * * ** * * * ** * ** ** * * * * * ** * * * * * * ** *** * ** *** * * * ** * *** ** * * * * * ** ***** * ** ** * * * * * * *** **** ***** * * *** * * ** * * * * * ** * * ** **** ** * ** * *** ** **** * * * * * ** * * * **** * ****** * * ** * * * * * * * * ** * * **** ** ** * ** * ** * * * ** * * * Step: 623 Error: 13 Elbow Size: 54 Mixture Data Radial Kernel Gamma 5 * * * ** * Loss: 30.46 ** * * * * ** * * * ** * ** ** * * * * * ** * * * * * * ** *** * ** *** * * * ** * *** ** * * * * * ** ***** * ** ** * * * * * * *** **** ***** * * *** * * ** * * * * * ** * * ** **** ** * ** * *** ** **** * * * * * ** * * * **** * ****** * * ** * * * * * * * * ** * * **** ** ** * ** * ** * * * ** * * * Step: 483 Error: 1 Elbow Size: 90 * * * ** * Loss: 1.01

Data Mining Trevor Hastie, Stanford University Classiﬁcation and Regression Trees Can handle huge datasets Can handle mixed predictors—quantitative and qualitative Easily ignore redundant variables Handle missing data elegantly Small trees are easy to interpret large trees are hard to interpret Often prediction performance is poor 36

Data Mining Trevor Hastie, Stanford University Tree ﬁt to SPAM data email 600/1536 ch 0.0555 ch 0.0555 spam email 280/1177 48/359 remove 0.06 remove 0.06 email 180/1065 hp 0.405 hp 0.405 spam spam email 9/112 26/337 0/22 george 0.15 CAPAVE 2.907 ch! 0.191 ch! 0.191 george 0.15 CAPAVE 2.907 email spam email spam spam email 80/861 100/204 6/109 0/3 19/110 7/227 george 0.005 CAPAVE 2.7505 1999 0.58 george 0.005 CAPAVE 2.7505 1999 0.58 email email email spam spam email 80/652 36/123 18/109 0/209 16/81 hp 0.03 free 0.065 hp 0.03 free 0.065 email spam email email 77/423 3/229 16/94 9/29 business 0.145 CAPMAX 10.5 CAPMAX 10.5 business 0.145 email email email spam 20/238 57/185 14/89 receive 0.125 edu 0.045 receive 0.125 edu 0.045 email spam email email 19/236 1/2 48/113 9/72 our 1.2 our 1.2 email spam 37/101 1/12 3/5 0/1 37

Data Mining Trevor Hastie, Stanford University Ensemble Methods and Boosting Classiﬁcation trees can be simple, but often produce noisy (bushy) or weak (stunted) classiﬁers. Bagging (Breiman, 1996): Fit many large trees to bootstrap-resampled versions of the training data, and classify by majority vote. Random Forests (Breiman 1999): Improvements over bagging. Boosting (Freund & Shapire, 1996): Fit many smallish trees to reweighted versions of the training data. Classify by weighted majority vote. In general Boosting Random Forests Bagging Single Tree. 38

Data Mining Trevor Hastie, Stanford University 39 0.065 0.070 Spam Data 0.055 0.050 0.045 0.040 Test Error 0.060 Bagging Random Forest Gradient Boosting (5 Node) 0 500 1000 1500 Number of Trees 2000 2500

Data Mining Trevor Hastie, Stanford University Boosting Weighted Sample CM (x) Average many trees, each grown to re-weighted versions of the training data. Weighting decorrelates the trees, by focussing on regions missed by past trees. Weighted Sample C3 (x) Weighted Sample C2 (x) Training Sample C1 (x) Final Classiﬁer is weighted average of classiﬁers: M C(x) sign m 1 αm Cm (x) 40

Data Mining Trevor Hastie, Stanford University Modern Gradient Boosting (Friedman, 2001) Fits an additive model Fm (X) T1 (X) T2 (X) T3 (X) . . . Tm (X) where each of the Tj (X) is a tree in X. Can be used for regression, logistic regression and more. For example, gradient boosting for regression works by repeatedly ﬁtting trees to the residuals: 1. Fit a small tree T1 (X) to Y . 2. Fit a small tree T2 (X) to the residual Y T1 (X). 3. Fit a small tree T3 (X) to the residual Y T1 (X) T2 (X). and so on. m is the tuning parameter, which must be chosen using a validation set (m too big will overﬁt). 41

Data Mining Trevor Hastie, Stanford University Gradient Boosting - Details For general loss function L[Y, Fm (X) Tm 1 (X)], ﬁt a tree to the gradient L/ Fm rather than residual. Shrink the new contribution before adding into the model: Fm (X) γTm 1 (X). This slows the forward stagewise algorithm, leading to improved performence. Tree depth determines interaction order of the model. Boosting will eventually overﬁt; number of terms m is a tuning parameter. As γ 0, boosting path behaves like 1 regularization path in the space of trees. 42

Data Mining Trevor Hastie, Stanford University 43 0.32 0.30 0.28 0.26 0.24 Misclassification Error 0.34 0.36 Effect of Shrinking 0 200 400 600 Iterations 800 1000

Data Mining Trevor Hastie, Stanford University 44 ROC curve for TREE, SVM and Boosting on SPAM data 0.8 1.0 Boosting on SPAM Boosting Error: 4.5% SVM Error: 6.7% TREE Error: 8.7% 0.0 0.2 0.4 Sensitivity 0.6 o o o 0.0 0.2 0.4 0.6 Specificity 0.8 1.0 Boosting dominates all other methods on SPAM data — 4.5% test error. Used 1000 trees (depth 6) with default settings for gbm package in R. ROC curve obtained by varying the threshold of the classiﬁer. Sensitivity: proportion of true spam identiﬁed Speciﬁcity: proportion of true email identiﬁed.

Data Mining Trevor Hastie, Stanford University Software R is free software for statistical modeling, graphics and a general programming environment. Works on PCs, Macs and Linux/Unix platforms. All the models here can be ﬁt in R. Splus is like R, implements the same language S. Splus is not free, but is supported. SAS and their Enterprise Miner can ﬁt most of the models mentioned in this talk, with good data-handling capabilities, and high-end user interfaces. Salford Systems has commercial versions of trees, random forests and gradient boosting. SVM software is all over, but beware of patent infringements if put to commercial use. 45

Data Mining Trevor Hastie, Stanford University Many free versions of good neural network software; Google will ﬁnd it. 46

Data Mining Trevor Hastie, Stanford University 11 Email or Spam? data from 4601 emails sent to an individual (named George, at HP labs, before 2000). Each is labeled as "spam" or "email". goal: build a customized spam ﬁlter. input features: relative frequencies of 57 of the most commonly occurring words and punctuation marks in these email

Related Documents:

Data Mining Trevor Hastie, Stanford University 1

Data Mining Trevor Hastie, Stanford University 2 Datamining for Prediction We have a collection of data pertaining to our business, industry, production process, monitoring device, etc. Often the goals of data-mining are vague, such as “look for patterns in the data” — not too helpful. In m

4 Views

2y ago

SEISMIC: A Self-Exciting Point Process Model for ...

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao Stanford University qyzhao@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Hera Y. He Stanford University yhe1@stanford.edu Anand Rajaraman Stanford University anand@cs.stanford.edu Jure Leskovec Stanford University jure@cs.stanford .

71 Views

3y ago

DATA MINING - University of Rajshahi

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

14 Views

1y ago

Data Mining in Bioinformatics - UQAM

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

42 Views

2y ago

Multi Relational Data Mining Approaches: A Data Mining Technique

Data Mining and its Techniques, Classification of Data Mining Objective of MRD, MRDM approaches, Applications of MRDM Keywords Data Mining, Multi-Relational Data mining, Inductive logic programming, Selection graph, Tuple ID propagation 1. INTRODUCTION The main objective of the data mining techniques is to extract .

10 Views

7m ago

Strong Rules for Discarding Predictors in Lasso-type Prob ...

Strong Rules for Discarding Predictors in Lasso-type Prob-lems Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Tay-lor, and Ryan J. Tibshirani Departments of Statistics and Health Research and Policy, Stanford University, Stanford CA 94305, USA. Email: tibs@stanford.edu. Summary.

37 Views

3y ago

Data Mining Algorithms - Stanford University

Data Mining CS102 Data Mining Looking for patterns in data Similar to unsupervised machine learning Popularity predates popularity of machine learning "Data mining" often associated with specific data types and patterns We will focus on "market-basket" data Widely applicable (despite the name) And two types of data mining patterns

11 Views

1y ago

Services remembering those who have died

There are also four possible examples of themes which could be followed. Each has a set of readings with an introduction to them. This could either act as a prompt to whoever is preaching, or could be read when there is no preacher present, as sometimes happens in our rural groups of churches where each church holds its own service. There is a linked prayer and suggestions for the music .

75 Views

3y ago

Recent Views

MERRILL ALABAMA CAPITOL SECRETARY OF STATE

Aug 24, 2018 · State House 38 Brian McGee state House 40 Pamela Jean Howard State House 41 Emily Anne Marcum State House 43 Carin Mayo State House 45 Jenn Gray state House 46 Felicia Stewart State House 4 7 1Jim Toomey State House 48 IAlli Summerford State House 51 Veronica R. Johnson State House 52 John W. Rogers, Jr. State House 53 Anthony Daniels

2y ago

375 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

293 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

54 Views

Policy - Kiwibank

House Insurance is provided by The Hollard Insurance Company Pty Ltd. The Hollard Insurance Company Pty Ltd is the only organisation responsible for claims under this cover. Administration of House Insurance and claims handling services are managed by Ando Insurance Group Limited on behalf of The Hollard Insurance Company Pty Ltd.

1y ago

133 Views

House insurance - Tower

insurance in New Zealand. We've included limits and exclusions to your house cover throughout this policy wording and on your certificate of insurance. What your house policy does and does not cover What we cover We cover your house, meaning the domestic buildings you own at the situation shown on your certificate of insurance including its: 1.

1y ago

145 Views

Data Mining Trevor Hastie, Stanford University 1 - Donuts Inc.

It looks like you're using an ad-blocker