Machine Learning - Cs.cmu.edu

1y ago
12 Views
3 Downloads
5.28 MB
29 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Karl Gosselin
Transcription

Machine Learning 10-601Tom M. MitchellMachine Learning DepartmentCarnegie Mellon UniversityJanuary 12, 2015Today: What is machine learning? Decision tree learning Course logisticsReadings: “The Discipline of ML” Mitchell, Chapter 3 Bishop, Chapter 14.4Machine Learning:Study of algorithms that improve their performance P at some task T with experience Ewell-defined learning task: P,T,E 1

Learning to Predict Emergency C-Sections[Sims et al., 2000]9714 patient records,each with 215 featuresLearning to classify text documentsspamvsnot spam2

Learning to detect objects in images(Prof. H. Schneiderman)Example training imagesfor each orientationLearn to classify theword a person isthinking about, basedon fMRI brain activity3

Learning prosthetic control fromneural implant[R. KassL. CastellanosA. Schwartz]Machine Learning - PracticeSpeech RecognitionObject recognitionMining Databases Support Vector MachinesText analysisControl learning Bayesian networks Hidden Markov models Deep neural networks Reinforcement learning .4

Machine Learning - TheoryOther theories forPAC Learning Theory(supervised concept learning) Reinforcement skill learning Semi-supervised learning Active student querying# examples (m)error rate (ε)representationalcomplexity (H)failureprobability (δ) also relating: # of mistakes during learning learner’s query strategy convergence rate asymptotic performance bias, varianceMachine Learning in Computer Science Machine learning already the preferred approach to–––––Speech recognition, Natural language processingComputer visionMedical outcomes analysisRobot controlML apps. All software apps. This ML niche is growing (why?)5

Machine Learning in Computer Science Machine learning already the preferred approach to–––––Speech recognition, Natural language processingComputer visionMedical outcomes analysisRobot controlML apps. All software apps. This ML niche is growing– Improved machine learning algorithms– Increased volume of online data– Increased demand for self-customizing softwareTom’s prediction: ML will be fastest-growing part of CS this centuryEconomicsandOrganizationalBehaviorComputer scienceAnimal learning(Cognitive science,Psychology,Neuroscience)Machine learningAdaptive ControlTheoryEvolutionStatistics6

What You’ll Learn in This Course The primary Machine Learning algorithms– Logistic regression, Bayesian methods, HMM’s, SVM’s,reinforcement learning, decision tree learning, boosting,unsupervised clustering, How to use them on real data– text, image, structured data– your own project Underlying statistical and computational theory Enough to read and understand ML research papersCourse logistics7

Machine Learning 10-601website: www.cs.cmu.edu/ ninamf/courses/601sp15Faculty Maria Balcan Tom MitchellSee webpage for Office hours Syllabus details Recitation sessions Grading policy Honesty policy Late homework policy Piazza pointers .TA’s Travis Dick Kirsten Early Ahmed Hefny Micol Marchetti-Bowick Willie Neiswanger Abu SaparovCourse assistant Sharon CavlovichHighlights of Course LogisticsOn the wait list?Late homework: Hang in there for first few weeksHomework 1 Available now, due fridayGrading: 30% homeworks ( 5-6)20% course project25% first midterm (March 2)25% final midterm (April 29)Academic integrity: full credit when duehalf credit next 48 hrszero credit after thatwe’ll delete your lowest HW scoremust turn in at least n-1 of the nhomeworks, even if lateBeing present at exams: You must be there – plan now.Two in-class exams, no other finalCheating à Fail class, be expelledfrom CMU8

Maria-Florina Balcan: Nina Foundations for Modern Machine Learning E.g., interactive, distributed, life-long learning Theoretical Computer Science, especially connectionsbetween learning theory & other fieldsApprox.AlgorithmsControlTheoryGame roidTheoryMechanism DesignTravis Dick When can we learn many conceptsfrom mostly unlabeled data byexploiting relationships betweenbetween concepts. Currently: Geometric relationships9

Kirstin Early Analyzing and predictingenergy consumption Reduce costs/usage and helppeople make informed decisionsPredicting energy costsfrom features of homeand occupant behaviorEnergy disaggregation:decomposing total electric signalinto individual appliancesAhmed Hefny How can we learn to track and predict the state of adynamical system only from noisy observations ? Can we exploit supervised learning methods to devise aflexible, local minima-free approach ?observations (oscillating pendulum)Extracted 2D statetrajectory10

Micol Marchetti-BowickHow can we use machine learning forbiological and medical research? Using genotype data to build personalizedmodels that can predict clinical outcomes Integrating data from multiple sources toperform cancer subtype analysis Structured sparse regression models forgenome-wide association studiessample weightGene expression dataw/ dendrogram (or haveone picture per task)genetic relatednessxyxyxyxyxyxyxyxyxyWillie Neiswanger If we want to apply machine learningalgorithms to BIG datasets How can we develop parallel, low-communication machinelearning algorithms? Such as embarrassingly parallel algorithms, where machineswork independently, without communication.11

Abu Saparov How can knowledge about theworld help computers understandnatural language? What kinds of machine learningtools are needed to understandsentences? “Carolyn ate the cake with a fork.”“Carolyn ate the cake with vanilla.”person eats foodperson eats arolynfoodcaketoppingvanillaTom MitchellHow can we build never-ending learners?Case study: never-ending language learner(NELL) runs 24x7 to learn to read the webmean avg. precision top1000see http://rtw.ml.cmu.edu# of beliefsvs.time (5 years)reading accuracyvs.time (5 years)12

Function Approximation andDecision tree learningFunction approximationProblem Setting: Set of possible instances X Unknown target function f : Xà Y Set of function hypotheses H { h h : Xà Y }Input:superscript: ith training example Training examples { x(i),y(i) } of unknown target function fOutput: Hypothesis h H that best approximates target function f13

Simple Training Data SetDay Outlook Temperature Humidity Wind PlayTennis?A Decision tree forf: Outlook, Temperature, Humidity, Wind à PlayTennis?Each internal node: test one discrete-valued attribute XiEach branch from a node: selects one value for XiEach leaf node: predict Y (or P(Y X leaf))14

Decision Tree LearningProblem Setting: Set of possible instances X– each instance x in X is a feature vector– e.g., Humidity low, Wind weak, Outlook rain, Temp hot Unknown target function f : Xà Y– Y 1 if we play tennis on this day, else 0 Set of function hypotheses H { h h : Xà Y }– each hypothesis h is a decision tree– trees sorts x to leaf, which assigns yDecision Tree LearningProblem Setting: Set of possible instances X– each instance x in X is a feature vectorx x1, x2 xn Unknown target function f : Xà Y– Y is discrete-valued Set of function hypotheses H { h h : Xà Y }– each hypothesis h is a decision treeInput: Training examples { x(i),y(i) } of unknown target function fOutput: Hypothesis h H that best approximates target function f15

Decision TreesSuppose X X1, Xn where Xi are boolean-valued variablesHow would you represent Y X2 X5 ?Y X2 X5How would you represent X2 X5 X3X4( X1)16

[ID3, C4.5, Quinlan]node RootSample Entropy17

EntropyEntropy H(X) of a random variable X# of possiblevalues for XH(X) is the expected number of bits needed to encode arandomly drawn value of X (under most efficient code)Why? Information theory: Most efficient possible code assigns -log2 P(X i) bitsto encode the message X i So, expected number of bits to code one random X is:EntropyEntropy H(X) of a random variable XSpecific conditional entropy H(X Y v) of X given Y v :Conditional entropy H(X Y) of X given Y :Mutual information (aka Information Gain) of X and Y :18

Information Gain is the mutual information betweeninput attribute A and target variable YInformation Gain is the expected reduction in entropyof target variable Y for data sample S, due to sortingon variable ASimple Training Data SetDay Outlook Temperature Humidity Wind PlayTennis?19

20

Final Decision Tree forf: Outlook, Temperature, Humidity, Wind à PlayTennis?Each internal node: test one discrete-valued attribute XiEach branch from a node: selects one value for XiEach leaf node: predict YWhich Tree Should We Output? ID3 performs heuristicsearch through space ofdecision trees It stops at smallestacceptable tree. Why?Occam’s razor: prefer thesimplest hypothesis thatfits the data21

Why Prefer Short Hypotheses? (Occam’s Razor)Arguments in favor:Arguments opposed:Why Prefer Short Hypotheses? (Occam’s Razor)Argument in favor: Fewer short hypotheses than long onesà a short hypothesis that fits the data is less likely to bea statistical coincidenceà highly probable that a sufficiently complex hypothesiswill fit the dataArgument opposed: Also fewer hypotheses with prime number of nodesand attributes beginning with “Z” What’s so special about “short” hypotheses?22

OverfittingConsider a hypothesis h and its Error rate over training data: True error rate over all data:We say h overfits the training data ifAmount of overfitting 23

24

Split data into training and validation setCreate tree that classifies training set correctly25

26

You should know: Well posed function approximation problems:– Instance space, X– Sample of labeled training data { x(i), y(i) }– Hypothesis space, H { f: Xà Y } Learning is a search/optimization problem over H– Various objective functions minimize training error (0-1 loss) among hypotheses that minimize training error, select smallest (?) Decision tree learning– Greedy top-down learning of decision trees (ID3, C4.5, .)– Overfitting and tree/rule post-pruning– Extensions Questions to think about (1) ID3 and C4.5 are heuristic algorithms thatsearch through the space of decision trees.Why not just do an exhaustive search?27

Questions to think about (2) Consider target function f: x1,x2 à y,where x1 and x2 are real-valued, y isboolean. What is the set of decision surfacesdescribable with decision trees that use eachattribute at most once?Questions to think about (3) Why use Information Gain to select attributesin decision trees? What other criteria seemreasonable, and what are the tradeoffs inmaking this choice?28

Questions to think about (4) What is the relationship between learningdecision trees, and learning IF-THEN rules29

- Improved machine learning algorithms - Increased volume of online data - Increased demand for self-customizing software All software apps. Machine Learning in Computer Science ML apps. Tom's prediction: ML will be fastest-growing part of CS this century Animal learning (Cognitive science, Psychology, Neuroscience) Machine learning

Related Documents:

CMU-CS QTR-120 CMU-CS-13-119 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Carnegie Mellon University, Qatar campus. The author can be reached at afahim@qatar.cmu.edu or amtibaa@cmu.edu or kharras@cs.cmu.edu. Abstract It is common practice for mobile devices to o oad computationally heavy tasks o to a cloud, which has

Recovering Temporally Rewiring Networks: A Model-based Approach Fan Guo fanguo@cs.cmu.edu Steve Hanneke shanneke@cs.cmu.edu Wenjie Fu wenjief@cs.cmu.edu Eric P. Xing epxing@cs.cmu.edu School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 USA

Carnegie Mellon University Pittsburgh, PA 15213 {dlmartin, bapoczos}@cs.cmu.edu Burton Hollifield Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 burtonh@andrew.cmu.edu . then compared results from neural networks and other techniques. The best results came from a partitioned neural network. 2.

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

Your path to CMU-Q 2 Why study at CMU-Q? 3 Academics at a glance 4 Biological Sciences 5 Business Administration 6 Computational Biology 7 Computer Science 8 Information Systems 9 Where do our graduates work? 10 Clubs and activities 11 What can you do at CMU-Q? 12 Five degree programs. A world of opportunities. Your university education starts .

Fig. 2: Full-Scale CMU walls set 1 and 2. 2.2 Full-Scale CMU Wall Construction The full-scale CMU walls were constructed in a reinforced concrete frame to replicate a simple, unreinforced, CMU infill wall. Two wall sizes were used in the full-scale experimental series. The first set of walls, nominally 174-in. wide by 111-in. tall, were 14 .

attempts to automate the process of image evolution through the use of artificial neural networks. The central objective of this study is to learn the user’s preferences, and to apply . baluja(&jcs.cmu.edu, pomerleau@cs.cmu.edu and tjochem@ri.cmu.edu. 326 S. Baluja et al. tool, perhaps a better explanation of the role of the genetic .

Neural Network-Based Face Detection Henry A. Rowley har@cs.cmu.edu Shumeet Baluja baluja@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Takeo Kanade tk@cs.cmu.edu Appears in Computer Vision and Pattern Recognition, 1996. Abstract We present a neural network-based face detection system.