Lectures On Machine Learning - Lecture 1: From Artificial . - Benasque

1y ago

13 Views

2 Downloads

2.99 MB

69 Pages

Last View : 2d ago

Last Download : 3m ago

Upload by : Philip Renner

Report this link

Download PDF

Transcription

Lectures on Machine Learning Lecture 1: from artificial intelligence to machine learning Stefano Carrazza TAE2018, 2-15 September 2018 European Organization for Nuclear Research (CERN) Acknowledgement: This project has received funding from HICCUP ERC Consolidator grant (614577) and by the European Unions Horizon 2020 research and innovation programme under grant agreement no. 740006. N 3PDF Machine Learning PDFs QCD

Why lectures on machine learning? 1

Why lectures on machine learning? because it is an essential set of algorithms for building models in science, 1

Why lectures on machine learning? because it is an essential set of algorithms for building models in science, fast development of new tools and algorithms in the past years, 1

Why lectures on machine learning? because it is an essential set of algorithms for building models in science, fast development of new tools and algorithms in the past years, nowadays it is a requirement in experimental and theoretical physics, 1

What expect from these lectures? 2

What expect from these lectures? Learn the basis of machine learning techniques. Learn when and how to apply machine learning algorithms. 2

The talk is divided in three lectures: Lecture 2 (tomorrow) Lecture 1 (today) Artificial intelligence Parameter learning Machine learning Non-linear models Model representation Beyond neural networks Metrics Clustering Lecture 3 (tomorrow) Hyperparameter tune Cross-validation ML in practice The PDF case study 3

Some references Books: The elements of statistical learning, T. Hastie, R. Tibshirani, J. Friedman. An introduction to statistical learning, G. James, D. Witten, T. Hastie, R. Tibshirani. Deep learning, I. Goodfellow, Y. Bengio, A. Courville. Online resources: HEP-ML: https://github.com/iml-wg/HEP-ML-Resources Tensorflow: http://tensorflow.org Keras: http://keras.io Scikit: http://scikit-learn.org 4

Artificial Intelligence

Artificial intelligence timeline 5

Defining A.I. Artificial intelligence (A.I.) is the science and engineering of making intelligent machines. (John McCarthy ‘56) 6

Defining A.I. Artificial intelligence (A.I.) is the science and engineering of making intelligent machines. (John McCarthy ‘56) Machine learning Natural language processing Artificial intelligence Knowledge reasoning Computer vision Speech Planning Robotics A.I. consist in the development of computer systems to perform tasks commonly associated with intelligence, such as learning . 6

A.I. and humans There are two categories of A.I. tasks: abstract and formal: easy for computers but difficult for humans, e.g. play chess (IBM’s Deep Blue 1997). Knowledge-based approach to artificial intelligence. intuitive for humans but hard to describe formally: e.g. recognizing faces in images or spoken words. Concept capture and generalization 7

A.I. technologies Historically, the knowledge-based approach has not led to a major success with intuitive tasks for humans, because: requires human supervision and hard-coded logical inference rules. lacks of representation learning ability. Solution: The A.I. system needs to acquire its own knowledge. This capability is known as machine learning (ML). e.g. write a program which learns the task. 8

Venn diagram for A.I. Artificial intelligence e.g. Knowledge bases Machine learning e.g. Logistic regression Representation learning e.g. Autoencoders Deep learning e.g. MLPs When a representation learning is difficult, ML provides deep learning techniques which allow the computer to build complex concepts out of simpler concepts, e.g. artificial neural networks (MLP). 9

Machine Learning

Machine learning definition Definition from A. Samuel in 1959: Field of study that gives computers the ability to learn without being explicitly programmed. 10

Machine learning definition Definition from A. Samuel in 1959: Field of study that gives computers the ability to learn without being explicitly programmed. Definition from T. Mitchell in 1998: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance on T , as measured by P , improves with experience E. 10

Machine learning examples Thanks to work in A.I. and new capability for computers: Database mining: Search engines Spam filters Medical and biological records 11

Machine learning examples Thanks to work in A.I. and new capability for computers: Database mining: Search engines Spam filters Medical and biological records Intuitive tasks for humans: Autonomous driving Natural language processing Robotics (reinforcement learning) Game playing (DQN algorithms) 11

ML applications in HEP 12

ML in experimental HEP There are many applications in experimental HEP involving the LHC measurements, including the Higgs discovery, such as: Tracking Particle identification Fast Simulation Event filtering 13

ML in experimental HEP Some remarkable examples are: Signal-background detection: Decision trees, artificial neural networks, support vector machines. Jet discrimination: Deep learning imaging techniques via convolutional neural networks. HEP detector simulation: Generative adversarial networks, e.g. LAGAN and CaloGAN. 14

ML in theoretical HEP 15

ML in theoretical HEP 1 1 NNPDF3.1 (NNLO) Supervised learning: 0.9 g/10 0.9 xf(x,µ 2 10 GeV2) 0.7 0.8 g/10 0.7 The structure of the proton at the LHC d 0.6 parton distribution functions 0.6 uv 0.5 0.3 c 0.5 0.4 Theoretical prediction and combination Monte Carlo reweighting techniques xf(x,µ 2 104 GeV 2) s 0.8 uv 0.4 u dv s b 0.3 0.2 u 0.1 dv 0.2 d 0.1 c 0 3 10 10 2 x 10 1 0 3 10 1 BSM searches and exclusion limits Clustering and compression Density estimation and anomaly detection 1 ST STJ STJ 101 100 10-1 #/ST 10-2 #/STJ PDF4LHC15 recommendation 10 1 x 10 σ per bin [pb] Unsupervised learning: 10 2 Top quark rapidity 2 POWHEG BOX PYTHIA8 neural network Sudakov #/STJ Monte Carlo sampling 1.6 1.25 1 0.8 0.6 1.6 1.25 1 0.8 0.6 1.6 1.25 1 0.8 0.6 -4 -3 -2 -1 0 1 2 y(t) 3 4 16

Machine learning algorithms Machine learning algorithms: Supervised learning Supervised learning: regression, classification, . Input Data Training Data Set Desired Output Supervisor Labels are known Algorithm Processing Output 17

Machine learning algorithms Machine learning algorithms: Supervised learning: regression, classification, . Unsupervised learning: clustering, dim-reduction, . Unsupervised learning Input Data Unknown Output No Training Data Set Discover Interpretation from Features Labels are unknown Algorithm Processing Output 17

Machine learning algorithms Reinforcement learning Machine learning algorithms: Input Data Supervised learning: regression, classification, . Unsupervised learning: clustering, dim-reduction, . Agent Best Action Reinforcement learning: real-time decisions, . Reward Environment Algorithm Output 17

Machine learning algorithms More than 60 algorithms. 18

Workflow in machine learning The operative workflow in ML is summarized by the following steps: Data Model Cost function Training Cross-validation Best model Optimizer The best model is then used to: supervised learning: make predictions for new observed data. unsupervised learning: extract features from the input data. 19

Models and metrics

Models and metrics Data Model Cost function Training Cross-validation Best model Optimizer 20

Model representation in supervised learning We define parametric and structure models for statistical inference: examples: linear models, neural networks, decision tree. Data Set for Training Machine Learning Algorithm Input x Model Estimated Prediction Given a training set of input-output pairs A (x1 , y1 ), . . . , (xn , yn ). Find a model M which: M(x) y where x is the input vector and y discrete labels in classification and real values in regression. 21

Model representation in supervised learning Examples of models: linear regression we define a vector x Rn as input and predict the value of a scalar y R as its output: ŷ(x) wT x b where w Rn is a vector of parameters and b a constant. Generalized linear models are also available increasing the power of linear models: 22

Model representation trade-offs However, the selection of the appropriate model comes with trade-offs: Prediction accuracy vs interpretability: e.g. linear model vs splines or neural networks. Linear Regression Decision Tree Interpretability K-Nearest Neighbors Random Forest Support Vector Machines Neural Nets Accuracy 23

Model representation trade-offs However, the selection of the appropriate model comes with trade-offs: Prediction accuracy vs interpretability: e.g. linear model vs splines or neural networks. Optimal capacity/flexibility: number of parameters, architecture deal with overfitting, and underfitting situations 23

Assessing the model performance How to check model performance? define metrics and statistical estimators for model performance. Examples: Regression: cost / loss / error function, Classification: cost function, precision, accuracy, recall, ROC, AUC 24

Assessing the model performance - cost function To access the model performance we define a cost function J(w) which often measures the difference between the target and the model output. In a optimization procedure, given a model ŷw , we search for: arg min J(w) w The mean square error (MSE) is the most commonly used for regression: n J(w) 1X (yi ŷw (xi ))2 n i 1 a quadratic function and convex function in linear regression. 25

Assessing the model performance - cost function Other cost functions are depending on the nature of the problem. ATLAS1JET11 - R 0.4 - k-factor models regression with uncertainties, chi-square: (yi ŷw (xi ))(σ 1 )ij (yj ŷw (xj )) i,j 1 1.050 1.025 1.000 NN model k-factor CGP y 0.8 1.100 NNLO/NLO n X 1.075 1.075 1.050 1.025 1.000 NN model k-factor CGP y 1.2 1.10 NNLO/NLO J(w) NN model k-factor CGP y 0.2 1.100 NNLO/NLO Some other examples: 1.05 1.00 NN model k-factor CGP y 1.8 1.05 1.00 0.95 NN model k-factor CGP y 2.2 1.10 NNLO/NLO σij is the data covariance matrix. e.g. for LHC data experimental statistical and systematics correlations. 1.05 1.00 0.95 NN model k-factor CGP y 2.8 1.10 NNLO/NLO where: NNLO/NLO 1.10 1.05 1.00 0.95 250 500 750 1000 pT (GeV) 1250 1500 1750 26

Assessing the model performance - cost function logistic regression (binary classification): cross-entropy n J(w) 1X yi log ŷw (xi ) (1 yi ) log(1 ŷw (xi )) n i 1 where ŷw (xi ) 1/(1 e w T xi ). 27

Assessing the model performance - cost function density estimate / regression: negative log-likelihood: J(w) n X log(ŷw (xi )) i 1 0.4 P(v1) 0.08 0.07 0.06 P 0.05 0.04 0.03 0.02 0.01 0.00 Gaussian mixture pdf RTBM model Sampling Ns 105 0.2 0.0 6 4 2 v2 0 2 4 20 10 0 v 10 20 6 6 4 2 0 v1 2 4 6 0.00 0.25 P(v2) 0.50 28

Training and test sets Another common issue related to model capacity in supervised learning: The model should not learn noise from data. The model should be able to generalize its output to new samples. To observe this issue we split the input data in training and test sets: training set error, JTr (w) test set/generalization error, JTest (w) Total number of examples Training Set Test Set 29

Training and test sets The test set is independent from the training set but follows the same probability distribution. Training Set Model building Permanent model Test Set Prediction Estimate performance 30

Bias-variance trade-off From a practical point of view dividing the input data in training and test: The training and test/generalization error conflict is known as bias-variance trade-off. 31

Bias-variance trade-off Supposing we have model ŷ(x) determined from a training data set, and considering as the true model Y y(X) , with y(x) E(Y X x), where the noise has zero mean and constant variance. If we take (x0 , y0 ) from the test set then: 2 E[(y0 ŷ(x0 ))2 ] (Bias[ŷ(x0 )]) Var[ŷ(x0 )] Var( ), where Bias[ŷ(x0 )] E[ŷ(x0 )] y(x0 ) 2 Var[ŷ(x0 )] E[ŷ(x0 )2 ] (E[ŷ(x0 )]) So, the expectation averages over the variability of y0 (bias) and the variability in the training data. 32

Bias-variance trade-off If ŷ increases flexibility, its variance increases and its biases decreases. Choosing the flexibility based on average test error amounts to a bias-variance trade-off: High Bias underfitting: erroneous assumptions in the learning algorithm. High Variance overfitting: erroneous sensitivity to small fluctuations (noise) in the training set. 33

Bias-variance trade-off More examples of bias-variance trade-off: 34

Bias-variance trade off Regularization techniques can be applied to modify the learning algorithm and reduce its generalization error but not its training error. For example, including the weight decay to the MSE cost function: n J(w) 1X (yi ŷw (xi ))2 λwT w. n i 1 where λ is a real number which express the preference for weights with smaller squared L2 norm. 35

Solution for the bias-variance trade off Tuning the hyperparameter λ we can regularize a model without modifying explicitly its capacity. 36

Solution for the bias-variance trade off A common way to reduce the bias-variance trade-off and choose the proper learning hyperparamters is to create a validation set that: not used by the training algorithm not used as test set Total number of examples Training Set Validation Set Test Set Training set: examples used for learning. Validation set: examples used to tune the hyperparameters. Test set: examples used only to access the performance. Techniques are available to deal with data samples with large and small number of examples. (talk later) 37

Assessing model performance for classification In binary classification tasks we usually complement the cost function with the accuracy metric defined as: TP TN Accuracy . TP TN FP FN Example: True Positives (TP) e.g. 8 False Positives (FP) e.g. 2 False Negatives (FN) e.g. 4 True Negatives (TN) e.g. 20 Accuracy 82% However accuracy does not represents the overall situation for skewed classes, i.e. imbalance data set with large disparity, e.g. signal and background. In this cases we define precision and recall. 38

Assessing model performance for classification Precision: proportion of correct positive identifications. Recall: proportion of correct actual positives identifications. Precision TP , TP FP True Positives (TP) e.g. 8 False Positives (FP) e.g. 2 False Negatives (FN) e.g. 4 True Negatives (TN) e.g. 20 Recall TP TP FN Accuracy 82% Precision 80% Recall 67% 39

Assessing model performance for classification In a binary classification we can vary the probability threshold and define: the receiver operating characteristic curve (ROC curve) is a metric which shows the relationship between correctly classified positive cases, the true positive rate (TRP/recall) and the incorrectly classified negative cases, false positive rate (FPR, (1-effectivity)). TPR TP , TP FN FPR FP FP TN 40

Assessing model performance for classification The area under the ROC curve (AUC) represents the probability that classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. AUC provides an aggregate measure of performance across all possible classification thresholds. AUC is 0 if predictions are 100% wrong AUC is 1 if all predictions are correct. AUC is scale-invariant and classification-threshold-invariant. 41

Summary

Summary We have covered the following topics: Motivation and overview of A.I. Definition and overview of ML. Model representation definition and trade-offs Learning metrics for accessing the model performance Metrics for classification. 42

This capability is known as machine learning (ML).!e.g. write a program which learns the task. 8. . ML provides deep learning techniques which allow the computer to build complex concepts out of simpler concepts, e.g. arti cial neural networks (MLP). 9. Machine Learning. Machine learning de nition De nition from A. Samuel in 1959:

Related Documents:

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

99 Views

2y ago

【E-book】Texts & Questions of 50 Lectures for TOEFL ...

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

147 Views

2y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

58 Views

1y ago

Lecture 1: Machine Learning Problem - University of Adelaide

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

36 Views

1y ago

Course 395: Machine Learning - Imperial College London

Lecture 5-6: Artificial Neural Networks (THs) Lecture 7-8: Instance Based Learning (M. Pantic) . (Notes) Lecture 17-18: Inductive Logic Programming (Notes) Maja Pantic Machine Learning (course 395) Lecture 1-2: Concept Learning Lecture 3-4: Decision Trees & CBC Intro Lecture 5-6: Artificial Neural Networks .

16 Views

1y ago

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

15 Views

3m ago

MSE 460: Electronic Materials, Devices, and Processing

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

152 Views

2y ago

Partial Differential Equations MSO-203-B - IIT Kanpur

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

36 Views

11m ago

Recent Views

Chapter 15 Rooming Houses - MassLegalHelp

Individual renters usually have their own separate room and their own agreement with the landlord. For example, you may stay for just a few days, but another renter may stay for 3 months. Rooming houses with 4 or more renters at the same time must be licensed. Some cities and towns have local protections for renters in rooming houses. Rooming House

2y ago

356 Views

Americans rent, buy, sell and think about home.

median rent among Generation X is 1,062 per month. The youngest renters, Generation Z, are typically paying the least at 882 per month.9 This echoes the notion that Generation Z renters are opting to rent the smallest apartments or homes, which translates to lower monthly rental payments. Approximately half of renters (47 percent) are paying for

1y ago

174 Views

Disaster assistance process overview

A guide through the post-disaster recovery process. KEY ASSISTANCE SOURCES TIPS HOMEOWNERS/RENTERS INSURANCE If you have homeowners or renters insurance, this provides you funds to repair or replace property damaged as a result of covered perils during a disaster. Additional types of insurance, such as auto or other peril-specific

1y ago

109 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Texas Demographic Trends and Projections and the 2020 Census

Income disparities place African Americans and Latinos at greater risk during times of income loss. Renters, renters w/low incomes, Blacks, and households w/children face greater risk of eviction. Persistently low health insurance coverage in the state increases vulnerability of Texans with employer based insurance.

1y ago

137 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Texas - milestonepnc

State Auto - Homeowners TEXAS 05/2017 State Auto Insurance Company UG-1.0 I - UNDERWRITING GUIDELINES A. Entire State Eligibility Guidelines Premier Protection Plus Standard Available Forms HO0004 - Renters HO0005 - Homeowner Expanded HO0006 - Condominium HO0003 - Homeowner HO0004 - Renters HO0005 - Homeowner Expanded

1y ago

112 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

Lectures On Machine Learning - Lecture 1: From Artificial . - Benasque

It looks like you're using an ad-blocker