Introduction To Machine Learning & Data Mining

4m ago
9 Views
0 Downloads
7.11 MB
104 Pages
Last View : 1m ago
Last Download : n/a
Upload by : Cade Thielen
Transcription

Introduction to Machine Learning & Data MiningJennifer NevillePurdue UniversityMay 24, edu/homes/neville/iris.dat

Data miningThe process of identifying valid, novel, potentially useful, andultimately understandable patterns in data(Fayyad, Piatetsky-Shapiro & Smith 1996)Artificial IntelligenceDatabasesVisualizationStatistics

ExampleDuring WWII, statistician Abraham Wald was asked tohelp the British decide where to add armor to their planes

The data revolutionThe last 35 years of research in ML/DM has resulted inwide spread adoption of predictive analytics toautomate and improve decision making.As “big data” efforts increase the collection of data so will the need for new data science methodology.Data today have more volume, velocity, variety, etc.Machine learning research develops statistical tools,models & algorithms that address these complexities.Data mining research focuses on how to scale tomassive data and how to incorporate feedbackto improve accuracy while minimizing effort.

The data mining eddataMachine ining

Overview Task specification Data representation Knowledge representation Learning technique Search scoring Prediction and/or interpretation

Overview Task specification Data representation Knowledge representation Learning technique Search scoring Prediction and/or interpretation

Task specification Objective of the person who is analyzing the data Description of the characteristics of the analysis and desired result Examples: From a set of labeled examples, devise an understandable model that willaccurately predict whether a stockbroker will commit fraud in the nearfuture. From a set of unlabeled examples, cluster stockbrokers into a set ofhomogeneous groups based on their demographic information

Exploratory data analysis Goal Interact with data withoutclear objective Techniques Visualization, adhocmodeling

Descriptive modeling Goal Summarize the dataor the underlyinggenerative processBn TechniquesBnFirmBroker (Bk)DisclosureBranch (Bn)BnSize Density estimation,cluster analysis andsegmentationProblemIn eBkAreaBkLayoffsBnOnWatchlistBkBnAlso known as: unsupervised learning

Predictive modeling Goal Learn model to predictunknown class labelvalues given observedattribute valuesBrokerAge 27Current CoWorkerCount 8Current BranchMode(Location) NY Techniques Classification, regression703564Current FirmAvg(Size) 12DisclosureCount(Yr 1995) 0Past CoWorkerCount(Gender M) 1510DisclosureCount 5Current BranchMode(Location) AZ7179218DisclosureCount(Type CC) 0Past FirmAvg(Size) 90200Past FirmMax(Size) 100049Past CoWorkerCount 35Current RegulatorMode(Status) RegBrokerYears In Industry 1639Also known as: supervised learning34249554

Pattern discovery Goal Detect patterns and rulesthat describe sets ofexamples Techniques --- --- ---- Association rules, graphmining, anomaly detection ---- - ---Model: global summary of a data setPattern: local to a subset of the data --

Overview Task specification Data representation Knowledge representation Learning technique Search scoring Prediction and/or interpretation

Data representation Choice of data structure for representing individual and collections ofmeasurements Individual measurements: single observations (e.g., person’s date of birth,product price) Collections of measurements: sets of observations that describe an instance(e.g., person, product) Choice of representation e of interest given knownvalues of other variables Focus on modeling the conditional distribution P( Y X ) or on modelingthe decision boundary for Y

Learning predictive models Choose a data representation Select a knowledge representation (a “model”) Defines a space of possible models M {M1, M2, ., Mk} Use search to identify “best” model(s) Search the space of models (i.e., with alternative structures and/orparameters) Evaluate possible models with scoring function to determine the modelwhich best fits the data

Knowledge representation Underlying structure of the model or patterns that we seek from the data Defines space of possible models for algorithm to search over Model: high-level global description of dataset “All models are wrong, some models are useful”G. Box and N. Draper (1987) Choice of model family determines space of parameters and structure Estimate model parameters and possibly model structure from training data

Classification treeBrokerAge 27Current CoWorkerCount 8Current BranchMode(Location) NY703564Current FirmAvg(Size) 12DisclosureCount(Yr 1995) 0Past CoWorkerCount(Gender M) 1510DisclosureCount 5Current BranchMode(Location) AZ7Model space:all possible decision trees1792189DisclosureCount(Type CC) 0Past FirmAvg(Size) 90200Past FirmMax(Size) 100049Past CoWorkerCount 35Current RegulatorMode(Status) RegBrokerYears In Industry 16334249554

Scoring functions Given a model M and dataset D, we would like to “score” model M withrespect to D Goal is to rank the models in terms of their utility (for capturing D)and choose the “best” model Score function can be used to search over parameters and/ormodel structure Score functions can be different for: Models vs. patterns Predictive vs. descriptive functions Models with varying complexity (i.e., number parameters)

Predictive scoring functions Assess the quality of predictions for a set of instances Measures difference between the prediction M makes for aninstance i and the true class label value of iS(M ) NtestXi 1Sum overexamples d f (x(i); M ), y(i)Distance betweenpredicted and truePredictedclass labelfor item i Trueclass labelfor item i

What space are we searching?Learned model ( 0 0.8, 1 0.4)XModelScoreModel spaceAlex Holehuse, Notes from Andrew Ng’s Machine Learning Class, http://www.holehouse.org/mlclass/01 02 Introduction regression analysis and gr.html

Searching over models/patterns Consider a space of possible models M {M1, M2, ., Mk} with parameters θ Search could be over model structures or parameters, e.g.: Parameters: In a linear regression model, find the regressioncoefficients (β) that minimize squared loss on the training data Model structure: In a decision trees, find the tree structure thatmaximizes accuracy on the training data

Decision trees

Tree models Easy to understand knowledgerepresentation Can handle mixed variables Recursive, divide and conquerlearning method Efficient inference

Tree learning Top-down recursive divide and conquer algorithm Start with all examples at root Select best attribute/feature Partition examples by selected attribute Recurse and repeat Other issues: How to construct features When to stop growing Pruning irrelevant parts of the tree

FraudAgeDegreeStartYrSeries7 22Y2005N-25N2003Y-31Y1995Y-27Y1999Y 24N2006N-29N2003NYchoose split on Series7Score each attribute splitfor these instances:Age, Degree, StartYr, StartYrSeries7-25N2003Y 22Y2005N-31Y1995Y 24N2006N-27Y1999Y-29N2003NYchoose split on Age 28ScoreN each attribute splitfor these instances:Age, Degree, StartYrFr

Data mining process 6 CS590D 12 Data Mining: Classification Schemes General functionality – Descriptive data mining – Predictive data mining Different views, different classifications – Kinds of data to be mined – Kinds of knowledge to be discovered – Kinds of techniqu