Introduction To Machine Learning: Improve Performance By .

2y ago

8 Views

2 Downloads

4.15 MB

75 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kaden Thurman

Report this link

Download PDF

Transcription

Introduction to Machine Learning:Improve Performance by ObservationCS271P, Fall Quarter, 2018Introduction to Artificial IntelligenceProf. Richard LathropRead Beforehand: R&N Ch. 18.1-18.4

You will be expected to know Understand Attributes, Error function, Classification & Regression,Hypothesis (Predictor function) What is Supervised Learning? Decision Tree Algorithm Entropy & Information Gain Tradeoff between train and test with model complexity Cross validation

Deep Learning in Physics:Searching for Exotic ParticlesThanks toPierre Baldi

Thanks toPierre Baldi

Thanks toPierre BaldiDaniel WhitesonPeter Sadowski

Higgs Boson DetectionThanks toPierre BaldiDeep network improves AUC by 8%BDT Boosted Decision Trees inTMVA packageNature Communications, July 2014

Thanks toPadhraic SmythApplication to Extra-Tropical CyclonesGaffney et al, Climate Dynamics, 2007

Thanks toPadhraic SmythOriginal DataIceland ClusterGreenland ClusterHorizontal Cluster

Thanks toPadhraic SmythCluster Shapes for Pacific Typhoon TracksCamargo et al, J. Climate, 2007

Thanks toPadhraic SmythTROPICAL CYCLONES Western North Pacific Padhraic Smyth, UC Irvine: DS 06Camargo et al, J. Climate, 200710

Thanks toPadhraic SmythAn ICS Undergraduate Success Story“The key student involved in this work started out as an ICSundergrad. Scott Gaffney took ICS 171 and 175, got interested in AI,started to work in my group, decided to stay in ICS for his PhD, did aterrific job in writing a thesis on curve-clustering and working withcollaborators in climate science to apply it to important scientificproblems, and is now one of the leaders of Yahoo! Labs reportingdirectly to the CEO there, http://labs.yahoo.com/author/gaffney/.Scott grew up locally in Orange County and is someone I like to pointas a great success story for ICS.”--- From Padhraic Smyth

Thanks toXiaohui Xie

p53 and Human CancersThanks toRichard Lathrop p53 is a central tumorsuppressor protein“The guardian of the genome” Cancer Mutants:About 50% of all humancancers have p53 mutations. Rescue Mutants:Several second-site mutationsrestore functionality to somep53 cancer mutants in vivo.p53 core domain bound to DNAImage Generated with UCSF ChimeraCho, Y., Gorina, S., Jeffrey, P.D., Pavletich, N.P. Crystalstructure of a p53 tumor suppressor-DNA complex:understanding tumorigenic mutations. Science v265pp.346-355 , 1994

Active Learning for Biological DiscoveryThanks toRichard LathropFind CancerRescueMutantsKnowledgeTheoryExperiment

Computational Active LearningPick the Best ( Most Informative) Unknown Examplesto LabelUnknownKnownExample 1Example 2Example 3 Example NExampleN 1Train theClassifierExampleN 2ClassifierExampleN 3ExampleChooseN 4Example(s) to LabelExample MTraining SetAdd New Example(s)To Training Set

Visualization of Selected Regions Positive Region:Predicted Active96-105 (Green) Negative Region:Predicted Inactive223-232 (Red) Expert Region:Predicted Active114-123 (Blue)Thanks toRichard Lathrop

Novel Single-a.a. Cancer Rescue MutantsThanks toRichard LathropMIP Positive(96-105)MIP Negative(223-232)Expert(114-123)# StrongRescue80 (p 0.008)6 (not significant)# Weak Rescue32 (not significant)7 (not significant)Total # Rescue112 (p 0.022)13 (not significant)No significant differences between the MIP Positive and Expert regions.Both were statistically significantly better than the MIP Negative region.The Positive region rescued for the first time the cancer mutant P152L.No previous single-a.a. rescue mutants in any region.

Complete architectures for intelligence? Search?– Solve the problem of what to do. Learning?– Learn what to do. Logic and inference?– Reason about what to do.– Encoded knowledge/”expert” systems? Know what to do. Modern view: It’s complex & multi-faceted.

Importance of representation Definition of “state” can be very important A good representation– Reveals important features– Hides irrelevant detail– Exposes useful constraints}Most important– Makes frequent operations easy to do– Rapidly or efficiently computable It’s nice to be fast

Reveals important features / Hides irrelevant detail“You can’t learn what you can’t represent.” --- G. Sussman In search: A man is traveling to market with a fox, a goose, and a bag ofoats. He comes to a river. The only way across the river is a boat thatcan hold the man and exactly one of the fox, goose or bag of oats. Thefox will eat the goose if left alone with it, and the goose will eat the oats ifleft alone with it.How can the man get all his possessions safely across the river? A good representation makes this problem easy:MFGO11100010101011110001M manF foxG gooseO oats0 starting side1 ending side

Reveals important features / Hides irrelevant detail“You can’t learn what you can’t represent.” --- G. Sussman In logic:If the unicorn is mythical, then it is immortal, but if it isnot mythical, then it is a mortal mammal. If the unicorn iseither immortal or a mammal, then it is horned. The unicornis magical if it is horned. Prove that the unicorn is both magical and horned. A good representation makes this problem easy:( Y R ) ( Y R ) ( Y M ) ( R H ) ( M H ) ( H G ) ( G H )Y unicorn is mYthicalR unicorn is moRtal( R M )( H)M unicorn is a maMmal(HM)H unicorn is Horned(H)G unicorn is maGical( )

A Learning ProblemWhether someone is going to play tennis on a given day, given someweather conditions.Records for the past two weeks.Today is Sunny, Hot, Normal humidity, and strong wind. Playing tennis?

A Learning ProblemTraining data:Target variable;Class label;Goal;Output variable;Dependent variable.Attributes;Input Variables;Features; Covariates.New data:?

Terminology Attributes– Also known as features, variables, independentvariables, covariates Target Variable– Also known as goal predicate, dependent variable, Classification– Also known as discrimination, supervisedclassification, Error function– Also known as objective function, loss function,

Types of learning Supervised learning: learn mapping, attributes target Classification: target variable is discrete (e.g., spam email) Regression: target variable is real-valued (e.g., stock market) Unsupervised learning: understand hidden data structure Clustering: group data into “similar” groups Latent space embedding: learn a simple data representation Other types of learning Reinforcement learning: e.g., game-playing agent Learning to rank, e.g., document ranking in Web search And many others .

Types of learningData is label(Learn mapping: attributes target)Discretetarget variableContinuoustarget tronClassifierData is unlabel(Discover hidden eansClustering

Simple illustrative learning problemProblem:Decide whether to wait for a table at arestaurant, based on the following te: is there an alternative restaurant nearby?Bar: is there a comfortable bar area to wait in?Fri/Sat: is today Friday or Saturday?Hungry: are we hungry?Patrons: number of people in the restaurant (None, Some, Full)Price: price range ( , , )Raining: is it raining outside?Reservation: have we made a reservation?Type: kind of restaurant (French, Italian, Thai, Burger)WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, 60)

Training Data for Supervised LearningTraining data X1:1.2.3.4.5.6.7.8.9.10.Alternate: is there an alternative restaurant nearby? YesBar: is there a comfortable bar area to wait in? NoFri/Sat: is today Friday or Saturday? NoHungry: are we hungry? YesPatrons: number of people in the restaurant (None, Some, Full) SomePrice: price range ( , , ) Raining: is it raining outside? NoReservation: have we made a reservation? YesType: kind of restaurant (French, Italian, Thai, Burger) FrenchWaitEstimate: estimated waiting time (0-10, 10-30, 30-60, 60) 0-10Training data X1 waited for a table at the restaurant.

Training Data for Supervised Learning

Supervised or Inductive learningxf(x)

Supervised or Inductive learningxf(x)h(x,θ)

Empirical Error Functions Sum is over all training pairs in the training data DExamples:distance squared error if h and f are real-valued (regression)distance delta-function if h and f are categorical (classification) Choosing the error function E( ) is as important as choosinghypothesis function h( ).- E( ) reflect real “loss” in problem- But often chosen for mathematical/algorithmic convenience

Supervised Learning as Optimization or Search Error function: Empirical learning finding h(x), or h(x; θ) that minimizes E(h)–If E(h) is differentiable continuous optimization problem using gradientdescent, etc E.g., multi-layer neural networks–If E(h) is non-differentiable (e.g., classification) systematic searchproblem through the space of functions h E.g., decision tree classifiers Once we decide on what the functional form of h is, and what the errorfunction E is, then machine learning typically reduces to a large search oroptimization problem Additional aspect: we really want to learn a function h that will generalizewell to new data, not just memorize training data – will return to thislater

Decision Tree RepresentationsKey requirements: Attribute-value description:Attributes must be expressible in a fixed collection of properties orattributes (e.g., True/False; hot, mild, cold; , , ). Predefined classes (target values):The target function has discrete output values (boolean or multiclass)

Decision Tree Representations Each node is labeled as an attribute and each edge is labeled as avalue of that attribute. Leaf nodes are labeled as the target variable.A xor B ( A B ) ( A B ) in DNF Every path in the tree could represent 1 row in the truth table Can represent any Boolean function In DNF: Disjunction of conjunctions Eg. (A B) v ( A B)

Decision Tree Representations Constrain h(.) to be a decision tree–This is the R&N tree for the Restaurant Wait problem:

Decision Tree Representations

Decision Tree LearningHow many distinct decision trees with n Boolean attributes?With 6 Boolean attributes, there are 18,446,744,073,709,551,616 possibledecision trees!

Decision Tree Learning Find the smallest decision tree consistent with the n examples- Unfortunately this is provably intractable to do optimallyTermination criteria-For noiseless data, if all examples at a node have the same label thendeclare it a leaf and backup-For noisy data it might not be possible to find a “pure” leaf using the givenattributes we’ll return to this later – but a simple approach is to have adepth-bound on the tree (or go to max depth) and use majority voteGreedy heuristic search used in practice:-Select root node that is “best” in some sense-Partition data into multiple subsets, depending on root attribute value-Recursively grow subtrees, until termination criteria met.

Pseudocode for Decision tree learningTerminationConditionsLoop through allvalues in bestRecursive call

Choosing an Attribute Idea: a good attribute splits the examples into subsets that are(ideally) "all positive" or "all negative"

Choosing an Attribute Idea: a good attribute splits the examples into subsets that are(ideally) "all positive" or "all negative" Patrons? is a better choice–––How can we quantify this?Information gain (next slides)Other metrics are also used, e.g., Gini impurity, variance reduction– Often very similar results to information gain in practice

Entropy and Information “Entropy” is a measure of randomness(amount of uncertainty; amount of hj nL-x8; https://www.youtube.com/watch?v ZsY4WcQOrfk

Entropy, H(p), with only 2 outcomesConsider 2 class problem:p probability of class #1,1 – p probability of class #2In binary case:H(p)high entropy,high disorder,high uncertainty100.5p1Low entropy, low disorder, low uncertainty

Entropy and InformationEntropy:– Log base two, units of entropy are “bits”– If only two outcomes: Examples:H(x) .25 log 4 .25 log 4 .25 log 4 .25 log 4 log 4 2 bitsMax entropy for 4 outcomesH(x) .75 log 4/3 .25 log 4 0.8133 bitsH(x) 1 log 1 0 bitsMin entropy

Information Gain H(P) current entropy of class distribution P at a particular node,before further partitioning the data H(P A) conditional entropy given attribute A weighted average entropy of conditional class distribution,after partitioning the data according to the values in A Gain(A) H(P) – H(P A)– Sometimes written IG(A) InformationGain(A) Note that by definition, conditional entropy can’t be greater thanthe entropy, so Information Gain must be non-negative Simple rule in decision tree learning– At each internal node, split on the node with the largestinformation gain [or equivalently, with smallest H(P A) ]

Root Node ExampleFor the training set,, H(6/12, 6/12) 1 bitpositive (p)negative (1-p)H(6/12, 6/12) -(6/12)*log2(6/12) - (6/12)*log2(6/12) 1Consider the attributes Patrons and Type:Patrons has the highest IG of all attributes and so is chosen bythe learning algorithm as the rootInformation gain is then repeatedly applied at internal nodes untilall leaves contain only examples from one class or the other

Choosing an attributeIG(Patrons) 0.541 bitsIG(Type) 0 bits

Decision Tree Learned Decision tree learned from the 12 examples:Hungry?

Decision Tree LearnedR&N TreeLearned Tree

Assessing PerformanceTraining data performance is typically optimistice.g., error rate on training dataReasons?- classifier may not have enough data to fully learn the concept (buton training data we don’t know this)- for noisy data, the classifier may overfit the training dataIn practice we want to assess performance “out of sample”how well will the classifier do on new unseen data? This is thetrue test of what we have learned (just like a classroom)With large data sets we can partition our data into 2 subsets, train and test- build a model on the training data- assess performance on the test data

Example of Test PerformanceRestaurant problem- simulate 100 data sets of different sizes- train on this data, and assess performance on an independent test set- learning curve plotting accuracy as a function of training set size- typical “diminishing returns” effect (some nice theory to explain this)

Overfitting and UnderfittingYX

A Complex ModelY high-order polynomial in XYX

A Much Simpler ModelY a X b noiseYX

Overfitting and UnderfittingYYXX

Example 2My biologist colleagues say,“Oh, that’s the sample thatwe dropped on the floor!”

Example 2

How Overfitting affects PredictionPredictiveErrorError on Training DataModel Complexity

How Overfitting affects PredictionPredictiveErrorError on Test DataError on Training DataModel Complexity

How Overfitting affects ror on Test DataError on Training DataModel ComplexityIdeal Rangefor Model ComplexityToo-Simple ModelsToo-Complex Models

Training and Validation Data We can use the class labels (target variable) for the trainingdata to compute the training error. We do now know the label of new data. How to compute theerror on test data? We could split the data into training set and validate set.Full Data SetTraining DataValidationDataIdea: train eachmodel on the“training data”and then testeach model’saccuracy onthe validation data

The k-fold Cross-Validation Method Why just choose one particular 90/10 “split” of the data?– In principle we could do this multiple times “k-fold Cross-Validation” (e.g., k 10)– randomly partition our full data set into k disjoint subsets (eachroughly of size n/k, n total number of training data points) for i 1:10 (here k 10)– train on 90% of data,– Acc(i) accuracy on other 10% end Cross-Validation-Accuracy 1/kΣiAcc(i) choose the method with the highest cross-validation accuracy common values for k are 5 and 10 Can also do “leave-one-out” where k n

Disjoint Validation Data SetsValidation Data (aka Test Data)Full Data Set1st partitionTraining DataFull data setTraining dataValidate data (test data)

Disjoint Validation Data SetsFull Data Set1st partitionFull data setTraining dataValidate data (test data)2nd partition

Disjoint Validation Data SetsFull Data Set1st partition2nd partitionFull data setTraining dataValidate data (test data)3rd partition4th partition5th partition

More on Cross-Validation Notes– cross-validation generates an approximate estimate of how wellthe learned model will do on “unseen” data– by averaging over different partitions it is more robust than just asingle train/validate partition of the data– “k-fold” cross-validation is a generalization partition data into disjoint validation subsets of size n/k train, validate, and average over the v partitions e.g., k 10 is commonly used– k-fold cross-validation is approximately k times computationallymore expensive than just fitting a model to all of the data

Summary Supervised (Inductive) learning– Error function, class of hypothesis/models {h}– Want to minimize E on our training data– Example: decision tree learning Generalization– Training data error is over-optimistic– We want to see performance on test data– Cross-validation is a useful practical approach

Unsupervised learning: understand hidden data structure Clustering: group data into “similar” groups Latent space embedding: learn a simple data representation Other types of learning Reinforcement learning: e.g., game-playing agent Learning to ran

Related Documents:

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

15 Views

3m ago

Mathematical Methods in Machine Learning - UMD

Machine learning has many different faces. We are interested in these aspects of machine learning which are related to representation theory. However, machine learning has been combined with other areas of mathematics. Statistical machine learning. Topological machine learning. Computer science. Wojciech Czaja Mathematical Methods in Machine .

26 Views

1y ago

Lecture 1: Machine Learning Problem - University of Adelaide

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

36 Views

1y ago

Machine Learning - B. Supervised Learning: Nonlinear Models B.5. A ...

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL .

13 Views

1y ago

Craft Council of Newfoundland and Labrador - Webflow

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

307 Views

2y ago

Flock: Hybrid Crowd-Machine Learning Classiﬁers - Stanford University

with machine learning algorithms to support weak areas of a machine-only classiﬁer. Supporting Machine Learning Interactive machine learning systems can speed up model evaluation and helping users quickly discover classiﬁer de-ﬁciencies. Some systems help users choose between multiple machine learning models (e.g., [17]) and tune model .

52 Views

7m ago

Artificial Intelligence, Machine Learning, Deep Learning ...

Artificial Intelligence, Machine Learning, and Deep Learning (AI/ML/DL) F(x) Deep Learning Artificial Intelligence Machine Learning Artificial Intelligence Technique where computer can mimic human behavior Machine Learning Subset of AI techniques which use algorithms to enable machines to learn from data Deep Learning

174 Views

3y ago

Introduction to machine and machine tools - sushreetech.com

Introduction to machine and machine tools Research · April 2015 DOI: 10.13140/RG.2.1.1419.7285 CITATIONS 0 READS 43,236 1 author: . machine and power hacksaws lathe machine, Planer lathe machine, Sloter lathe machine etc. Basics of Mechanical Engineering (B.M.E) Brown Hill College of Engg. & Tech.

20 Views

10m ago

Recent Views

An Introduction to Islamic capital markets - REDmoney Events

Capital markets are markets for buying and selling equity securities (i.e. shares) and debt securities (i.e. bonds). Capital markets include primary markets, where new stock and bond issues are sold to investors, and secondary markets, where existing securities are traded Key participants: buyers, sellers and financial intermediaries

1y ago

104 Views

Don't fear the bear (RES-4011Q-A)

the 0% line are bull markets, and the red-shaded areas below it are bear markets — a decline of more than 20%. You'll notice that bear markets are shorter than bull markets. On average, bear markets last about 12 months, with an average loss . of about 32%.* Bull markets, on average, last nearly five years (54 months), with an average gain .

1y ago

105 Views

1213 How to Educate Consumers on Your Financial Services

Financial Empowerment 2 Financial education –strategy that provides people with financial knowledge, skills and resources Financial education builds an individual’s knowledge, skills and capacity to use resources and tools, including financial products and services leading to Financial Literacy Financial empowerment includes financial education and financial literacy –focuses .

3y ago

301 Views

Motives for Investing in Foreign Markets

international financial markets have been developed. Financial man-agers of MNCs must understand the various international financial markets that are available so that they can use those markets to facilitate their international business transactions. The specific objectives of this chapter are to describe the background and corporate use of .

3y ago

142 Views

Common Risk Factors in Cryptocurrency

excess returns over the risk-free rate of each portfolio, and the excess returns of the long- . Journal of Financial Economics, Journal of Financial Markets Journal of Financial Economics. Journal of Financial Economics. Journal of Financial Economics Journal of Financial Economics Journal of Financial Economics Journal of Financial Economics .

2y ago

203 Views

GEE II: FINANCIAL MARKETS, MONETARY POLICY AND THE

Policy, 11th Edition (New York: Addison-Wesley, 2018) V. FINANCIAL CRISES IN ADVANCED ECONOMIES (MB) Ch. 12 Financial Crises (C) Mishkin, F.S., "Asymmetric Information and Financial Crises: A Historical Perspective," in R. Glenn Hubbard, ed., Financial Markets and Financial Cri

2y ago

309 Views

Consumer protection in the banking, insurance and financial services .

insurance and financial services sector. ASIC's role in the financial system 2 As Australia's corporate, markets, financial services and consumer credit regulator, ASIC strives to ensure that Australia's financial markets are fair and transparent and supported by confident and informed investors and financial consumers. 3 The

1y ago

122 Views

International financial markets and bank funding in the euro area .

International financial markets and bank funding in the euro area: dynamics and participants1 Jaime Caruana Adrian Van Rixtel General Manager Senior Economist Bank for International Settlements 1. Introduction Financial markets are undergoing major and at times very rapid changes, mostly as a result of the financial crisis that began in 2007.

1y ago

100 Views

FINS5512 FINANCIAL MARKETS AND INSTITUTIONS Course Outline .

This course will provide students with an introduction to Australian financial markets and an evaluation of the institutions, instruments and participants involved in the industry. The mainstream markets to be evaluated include the equity, money, bond, futures, options and exchange rate markets. The subject

3y ago

146 Views

Money & Capital Markets - City University of New York

Financial Markets & Institutions By Mishkin and Eakins 7th edition (2012) McGraw-Hill Publishers ISBN: 978-0-13-213683-9 Learning Goals In this case study based graduate course we will 1) explore the function and structure of financial markets, including money, bond, stock, mortgage and foreign exchange markets,

3y ago

125 Views

Impact of COVID-19 on the Global Financial System

markets. Equity markets began declining rapidly, losing around 30% of market value in a matter of weeks, with the speed of the sell-off exceeding that of the global financial crisis of 2008-2009 (GFC). By early March, short-term funding markets and international US dollar funding markets started to show signs of stress and, in the

3y ago

112 Views

2. An Overview of the Financial System

2-5 Structure of Financial Markets Debt and Equity Markets Primary and Secondary Markets Investment Banks underwrite securities in primary markets Brokers and dealers work in seconda

2y ago

111 Views

HDFC MF Yearbook 2021

HDFC group pledged Rs150cr contribution to the PM CARES Fund to provide relief and rehabilitation measures towards the . Global Economy and Markets 2. Key Future Trends 3. Indian Economy 4. Equity Markets & Sector Overview 5. Fixed Income Markets 3. . Developed markets (DMs) likely to achieve herd immunity by CY21 and Emerging Markets .

3y ago

124 Views

Feb 10th, 2020 Tax Loss Harvesting (TLH) South Bay .

SCHF FTSE Developed Markets Ex-US Emerging Markets VWO FTSE Emerging Markets EEM MSCI Emerging Markets Index IEMG MSCI Emerging Markets Investable Market Index Dividend Stocks VIG NASDAQ US Dividend Achievers Select SCHD Dow Jones U.S. Dividend 100 TIPS VTIP Barclays Capital US TIPS 0-5 Years

2y ago

314 Views

2021 Capital Markets Fact Book - sifma

Introduction 2021 Capital Markets Fact Book Page 7 US Capital Markets Are the Largest in the World The U.S. capital markets are largest in the world and continue to be among the deepest, most liquid, and most efficient. Equities: U.S. equity markets represent 38.5% of the 105.8 trillion in global equity market cap, or 40.7 trillion; this

1y ago

108 Views

Introduction To Machine Learning: Improve Performance By .

It looks like you're using an ad-blocker