ECE595 / STAT598: Machine Learning I Lecture 15 Logistic Regression 2

1y ago

9 Views

2 Downloads

962.38 KB

30 Pages

Last View : 14d ago

Last Download : 3m ago

Upload by : Pierre Damon

Report this link

Download PDF

Transcription

ECE595 / STAT598: Machine Learning I Lecture 15 Logistic Regression 2 Spring 2020 Stanley Chan School of Electrical and Computer Engineering Purdue University c Stanley Chan 2020. All Rights Reserved. 1 / 30

Overview In linear discriminant analysis (LDA), there are generally two types of approaches Generative approach: Estimate model, then define the classifier Discriminative approach: Directly define the classifier c Stanley Chan 2020. All Rights Reserved. 2 / 30

Outline Discriminative Approaches Lecture 14 Logistic Regression 1 Lecture 15 Logistic Regression 2 This lecture: Logistic Regression 2 Gradient Descent Convexity Gradient Regularization Connection with Bayes Derivation Interpretation Comparison with Linear Regression Is logistic regression better than linear? Case studies c Stanley Chan 2020. All Rights Reserved. 3 / 30

Logistic Regression and Deep Learning Logistic regression can be considered as the last layer of a deep network Inputs are x n , weights are w The sigmoid function is the nonlinear activation To train the model, you compare the prediction error and minimize the loss by updating the weights c Stanley Chan 2020. All Rights Reserved. 5 / 30

Training Loss Function J(θ) N X n 1 N X L(hθ (x n ), yn ) n o yn log hθ (x n ) (1 yn ) log(1 hθ (x n )) n 1 This is called the cross-entropy loss Consider two cases ( 0, yn log hθ (x n ) , ( 0, (1 yn )(1 log hθ (x n )) , No solution if mismatch if yn 1, if yn 1, and hθ (x n ) 1, and hθ (x n ) 0, if yn 0, if yn 0, and hθ (x n ) 0, and hθ (x n ) 1. c Stanley Chan 2020. All Rights Reserved. 6 / 30

Convexity of Logistic Training Loss Recall that n n o X hθ (x n ) J(θ) yn log log(1 hθ (x n )) 1 hθ (x n ) n 1 The first term is linear, so it is convex. The second term: Gradient: 1 θ [ log(1 hθ (x))] θ log 1 T 1 e θ x " # T h i e θ x θ T x θ T x θ log log e log(1 e ) θ T 1 e θ x h i h i T T θ θ T x log(1 e θ x ) x θ log 1 e θ x ! T e θ x x x hθ (x)x. T 1 e θ x c Stanley Chan 2020. All Rights Reserved. 7 / 30

Convexity of Logistic Training Loss Gradient of second term is θ [ log(1 hθ (x))] hθ (x)x. Hessian is: 2θ [ log(1 hθ (x))] θ [hθ (x)x] 1 x θ T 1 e θ x ! 1 θ T x xx T e T θ x 2 (1 e ) 1 1 1 xx T T T θ x θ x 1 e 1 e T hθ (x)[1 hθ (x)]xx . c Stanley Chan 2020. All Rights Reserved. 8 / 30

Convexity of Logistic Training Loss For any v Rd , we have that h i v T 2θ [ log(1 hθ (x))]v v T hθ (x)[1 hθ (x)]xx T v (hθ (x)[1 hθ (x)]) kv T xk2 0. Therefore the Hessian is positive semi-definite. So log(1 hθ (x) is convex in θ. Conclusion: The training loss function n n o X hθ (x n ) J(θ) yn log log(1 hθ (x n )) 1 hθ (x n ) n 1 is convex in θ. So we can use convex optimization algorithms to find θ. c Stanley Chan 2020. All Rights Reserved. 9 / 30

Convex Optimization for Logistic Regression We can use CVX to solve the logistic regression problem But it requires some re-organization of the equations J(θ) N n o X yn θ T x n log(1 hθ (x n )) n 1 N X n 1 N X n yn θ T x n log 1 eθ T ! xn 1 eθ T o xn n o T yn θ T x n log 1 e θ x n n 1 N X n 1 !T yn x n θ N X log 1 e θ T xn . n 1 The last term is a sum of log-sum-exp: log(e 0 e θ Tx ). c Stanley Chan 2020. All Rights Reserved. 10 / 30

Convex Optimization for Logistic Regression 1 Data Estimated True 0.8 0.6 0.4 0.2 0 0 2 4 6 8 Black: The true model. You create it. Blue circles: Samples drawn from the true distribution. Red: Trained model from the samples. 10 c Stanley Chan 2020. All Rights Reserved. 11 / 30

Gradient Descent for Logistic Regression The training loss function is J(θ) n n o X yn θ T x n log(1 hθ (x n )) . n 1 Recall that θ [ log(1 hθ (x))] hθ (x)x. You can run gradient descent θ (k 1) θ (k) αk θ J(θ (k) ) θ (k) αk N X ! (hθ(k) (x n ) yn )x n . n 1 Since the loss function is convex, guaranteed to find global minimum. c Stanley Chan 2020. All Rights Reserved. 12 / 30

Regularization in Logistic Regression The loss function is n n o X J(θ) yn θ T x n log(1 hθ (x n )) n 1 n n X T yn θ x n log 1 n 1 o 1 1 e θ T xn What if hθ (x n ) 1? (We need θ T x n .) Then we have log(1 1) log 0, which is . Same thing happens in the equivalent form !T N N X X T J(θ) yn x n θ log 1 e θ x n . n 1 When θ T x n , we have log( ). n 1 c Stanley Chan 2020. All Rights Reserved. 13 / 30

Regularization in Logistic Regression Add a small regularization !T N N X X T yn x n θ log 1 e θ x n λkθk2 . J(θ) n 1 n 1 Re-run the same CVX program 1 0.8 0.6 0.4 0.2 0 -5 0 5 10 c Stanley Chan 2020. All Rights Reserved. 15 15 / 30

Regularization in Logistic Regression If you make λ really really small . !T N N X X T yn x n θ log 1 e θ x n λkθk2 . J(θ) n 1 n 1 Re-run the same CVX program 1 0.8 0.6 0.4 0.2 0 -5 0 5 10 c Stanley Chan 2020. All Rights Reserved. 15 16 / 30

Connection with Bayes The likelihood is 1 exp (x µi )T Σ 1 (x µi ) p(x i) p d 2 (2π) Σ 1 The prior is pY (i) πi . The posterior is p(x 1)pY (1) p(x 1)pY (1) p(x 0)pY (0) 1 1 n o p(x 0)pY (0) Y (1) 1 p(x 1)pY (1) 1 exp log p(x 1)p p(x 0)pY (0) p(1 x) 1 n o . π1 1 exp log π0 log p(x 1) p(x 0) c Stanley Chan 2020. All Rights Reserved. 19 / 30

Connection with Bayes We can show that the last term is p(x 1) log p(x 0) 1 d exp 12 (x µ1 )T Σ 1 (x µ1 ) (2π) Σ log 1 d exp 12 (x µ0 )T Σ 1 (x µ0 ) (2π) Σ h i 1 (x µ1 )T Σ 1 (x µ1 ) (x µ0 )T Σ 1 (x µ0 ) 2 1 T 1 1 (µ1 µ0 )T Σ 1 x µ1 Σ µ1 µT Σ µ 0 . 0 2 Let us define w Σ 1 (µ1 µ0 ) 1 T 1 π1 T 1 w0 µ1 Σ µ1 µ0 Σ µ0 log 2 π0 c Stanley Chan 2020. All Rights Reserved. 20 / 30

Connection with Bayes Then, p(x 1) 1 T 1 1 log (µ1 µ0 )T Σ 1 x µ1 Σ µ1 µT Σ µ 0 0 p(x 0) 2 w T x w0 log π1 /π0 Therefore, p(1 x) 1 n o 1 exp log ππ10 log p(x 1) p(x 0) 1 1 exp{ (w T x w0 )} hθ (x) c Stanley Chan 2020. All Rights Reserved. 21 / 30

Connection with Bayes The hypothesis function is the posterior distribution 1 hθ (x) 1 exp{ (w T x w0 )} exp{ (w T x w0 ) pY X (0 x) 1 hθ (x), 1 exp{ (w T x w0 )} pY X (1 x) (1) So logistic regression offers probabilistic reasoning which linear regression does not Not true when the covariances are different Remark: If the covariances are different, the Bayes returns a quadratic classifier c Stanley Chan 2020. All Rights Reserved. 22 / 30

Is Logistic Regression Better than Linear? Scenario 1: Identical Covariance. Equal Prior. Enough samples. N (0, 1) with 100 samples and N (10, 1) with 100 samples. Linear and logistic: Not much different. 1 0.8 0.6 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision true samples training samples 0.4 0.2 0 -5 0 5 10 15 c Stanley Chan 2020. All Rights Reserved. 25 / 30

The False Sense of Good Fitting Scenario 2: Identical Covariance. Equal Prior. Not a lot of samples. N (0, 2) with 10 samples and N (10, 2) with 10 samples. Linear and logistic: Not much different. 1 0.8 0.6 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision true samples training samples 0.4 0.2 0 -5 0 5 10 15 c Stanley Chan 2020. All Rights Reserved. 26 / 30

Is Logistic Regression Better than Linear? Scenario 3: Different Covariance. Equal Prior. N (0, 2) with 50 samples and N (10, 0.2) with 50 samples. Linear and logistic: Equally bad. 1 0.8 0.6 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision true samples training samples 0.4 0.2 0 -5 0 5 10 15 c Stanley Chan 2020. All Rights Reserved. 27 / 30

Is Logistic Regression Better than Linear? Scenario 4: Identical Covariance. Unequal Prior. Training size proportional to prior: 180 samples and 20 samples. N (0, 1) with π0 0.9 and N (10, 1) with π1 0.1. Linear and logistic: Not much different. 1 0.8 0.6 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision true samples training samples 0.4 0.2 0 -5 0 5 10 c Stanley Chan 2020. All Rights 15 Reserved. 28 / 30

So what can we say about Logistic Regression? Logistic regression empowers a discriminative method with probabilistic reasonings. The hypothesis function is the posterior probability 1 hθ (x) 1 exp{ (w T x w0 )} exp{ (w T x w0 ) p(0 x) 1 hθ (x), 1 exp{ (w T x w0 )} p(1 x) Logistic is yet another special case of Bayesian More or less the same performance as linear regression Logistic can give lower training error — which looks better on plots. But its generalization is similar to linear regression c Stanley Chan 2020. All Rights Reserved. 29 / 30

Reading List Logistic Regression (Machine Learning Perspective) Chris Bishop’s Pattern Recognition, Chapter 4.3 Hastie-Tibshirani-Friedman’s Elements of Statistical Learning, Chapter 4.4 Stanford CS 229 Discriminant Algorithms http://cs229.stanford.edu/notes/cs229-notes1.pdf CMU Lecture https: //www.stat.cmu.edu/ cshalizi/uADA/12/lectures/ch12.pdf Stanford Language Processing https://web.stanford.edu/ jurafsky/slp3/ (Lecture 5) Logistic Regression (Statistics Perspective) Duke Lecture https://www2.stat.duke.edu/courses/Spring13/ sta102.001/Lec/Lec20.pdf Princeton Lecture https://data.princeton.edu/wws509/notes/c3.pdf c Stanley Chan 2020. All Rights Reserved. 30 / 30

Is Logistic Regression Better than Linear? Scenario 1: Identical Covariance. Equal Prior. Enough samples. N(0;1) with 100 samples and N(10;1) with 100 samples. Linear and logistic: Not much di erent.-5 0 5 10 15 0 0.2 0.4 0.6 0.8 1 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision

Related Documents:

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith ...

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith Israel, Atlanta Georgia Wow ׳ג ׳א:׳א תישארב (א) ׃ץרֶָֽאָּהָּ תאֵֵ֥וְּ םִימִַׁ֖שַָּה תאֵֵ֥ םיקִִ֑לֹאֱ ארָָּ֣ Îָּ תישִִׁ֖ארֵ Îְּ(ב) חַורְָּ֣ו ם

449 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 24 Probably ...

Learning is feasible if x p(x) p(x) says: Training and testing are related If training and testing are u

10 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 02 ...

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 2: Regularization Ridge Regression Regularization Parameter LASSO

33 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 01: Linear ...

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

45 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 14 Logistic Regression

Convex Optimization for Logistic Regression We can use CVX to solve the logistic regression problem But it requires some re-organization of the equations J( ) XN n 1 n y n Tx n log(1 h (x n)) o XN n 1 n y n Tx n log 1 e Tx n 1 e Tx n! o XN n 1 n y n Tx n log 1 e Tx n o 8 : XN n 1 y nx n! T XN n 1 log 1 e Tx n 9 ;: The last .

8 Views

1y ago

ECE595 / STAT598: Machine Learning I Lecture 33 Adversarial Attack: An ...

Maximum Loss Attack De nition (Maximum Loss Attack) The maximum loss attack nds a perturbed data x by solving the optimization maximize x g t(x ) max j6 t fg j(x )g subject to kx x 0k ; (2) where kkcan be any norm speci ed by the user, and 0 denotes the attack strength. I want to bound my attack kx x 0k I want to make g t(x ) as big as possible

10 Views

9m ago

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

17 Views

3m ago

GEOPHYSICAL SURVEY AND ARCHAEOLOGICAL EVALUATION OF A ...

with an illustration of a sword and the main road located to the west of Sutton 6 7 Part of Thomas Jeffrey’s 1771 map of Yorkshire 6 8 Locations of the geophysical survey grid and the excavation trench 7 9 Results of the electrical earth resistance survey of the area across Old London Road, Towton 8 10 Results of geophysical survey shown superimposed over an aerial photograph 9 11 Electrical .

93 Views

3y ago

Recent Views

Fifth ASISA Insurance Gap Study

Insurance Gap Insurance Need -Actual Cover gap: k) www.truesouth.co.za Need for insurance Earnings R0.6m Replacement requirement 54% Capitalisation factor 13.8 Insurance need R4.6m Actual insurance Retail R1.5m Group Life R0.8m Government grants R0.0m Total R2.3m R4.6m -R2.3m R2.3m Average death insurance gap for richest 20% of SA .

1y ago

166 Views

FCA GAP Insurance research

purchase GAP insurance 6 2.6. Add-on GAP insurance purchasers are not a homogeneous group 6 2.7. The remedies may have provided reassurance, but have not yet helped improve knowledge 6 3. Profile of research participants 8 3.1. Car purchase 8 3.2. Demographics 8 3.3. Awareness of GAP insurance 8 3.4. Purchase of GAP insurance 9 3.5.

1y ago

155 Views

A world at risk Closing the insurance gap

Closing the insurance gap A world at risk 07 1. The size of the global insurance gap A world at risk, Lloyd's second underinsurance report, shows there is a global insurance gap of US 162.5 billion in 2018. This shows there is a significant gap between the level of insurance in place to cover

1y ago

137 Views

The Life Insurance Need Gap - LIMRA

Need Life Insurance Have Life Insurance The gap between "I need" and "I "have" equals 18-points, or 46 million consumers This understates unmet need in the market. Life Insurance Ownership Gap - 2011 to 2021 Source: 2021 Insurance Barometer Life Insurance Ownership Gap 18-points

1y ago

164 Views

Sample Gap Analysis Template

Traditionally, a skills gap analysis is undertaken using paper-based assessments and supporting interviews; however, technological advancements, such as skill management software, are allowing large companies to administer a skills gap analysis without using a significant proportion of human resources (Antonucci and d’Ovidio, 2012).File Size: 778KBPage Count: 24Explore furtherSkills gap analysis template - Skills for Care - Homewww.skillsforcare.org.uk40 Gap Analysis Templates & Exmaples (Word, Excel, PDF)templatelab.comConducting A Gap Analysis: A Four-Step .com(PDF) Gap Analysis - et30 FREE Gap Analysis Templates & Examples - .comRecommended to you b

2y ago

181 Views

Making Sense of GAP Insurance - How To Mind Your GAP

find more information under "What is excluded under a GAP insurance policy?". 9 These figures apply where the customer is required to pay a motor insurer's excess of 250. Some GAP insurance providers will pay an amount towards this excess. Please check your GAP insurance policy for details. Written off at 6 months Written off at 30 months

1y ago

127 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

2 4 About Girl Ambassador Program (GAP) 6 Closing Gaps through GAP 7 .

GAP Pathways GAP Benefits Opportunities GAP Commitments Participants Parents Ambassadors GAP Process Get Connected 2 4 6 7 8 10 12 15 1. TABLE OF CONTENTS About Girls For A Change . GAP is a four-year, tiered approach that supports paced learning and development, where certified instructors

10m ago

108 Views

Gap Year Alumni Survey 2020 - Gap Year Association

Canadian gap year participants and a lack of knowledge about the "American" gap year. The Gap Year Alumni Survey of U.S. and Canadian gap year participants was conducted in 2020, following the first ever survey of its kind in 2015. Like the previous survey, the 2020 survey sought to capture the scale, scope, and outcomes of gap year .

10m ago

82 Views

INGENI SERVICES RTI a n d RPP GAP INSURANCE

Ingeni Services RTI and RPP GAP Insurance V10 April 2018 Page 2 of 13 INGENI SERVICES RTI and RPP GAP INSURANCE This module should be taken AFTER the generic ‘Finance & Total Gap Insurance - Part 1 - an overview’ Unit (Unit 8) within the FCA Refresher Training Course. All of the following produc

2y ago

349 Views

Gender Pay Gap Report 2020 - RSA Insurance Group

Pay Gap is 27.4%, our Mean Bonus Gap is 64.4% and our Median Bonus Gap is 43.0%. The information presented below relates to employees of Royal & Sun Alliance Insurance plc and is calculated in line with the government regulations. Please see overleaf for an explanation of the comparison between 2020 and previous years. Median Mean Gender Pay Gap

1y ago

136 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Biba Webinar Gap Insurance

September 2015 - FCA introduced new rules for dealers selling GAP Insurance. WHY? To achieve better customer outcomes from more informed purchasing decisions; and Improved competition. FCA recognised GAP insurance premiums are significantly higher. Almost half of customers unaware they could buy GAP elsewhere.

1y ago

142 Views

Statutory Pay Gap Report 2019 Gender; Disability .

3. Statutory Gender Pay Gap Report 2019 In this section is reported the Statutory Gender Pay Gap, the Gender Pay Gap (Excluding Casual Staff), and a review of Bonus Pay. A positive black number, means that there is a pay gap in favour of men, whereas a negative red number means that there is a pay gap in favour of women. 3.1. Statutory Gender .

3y ago

216 Views

Gender Pay Gap Report - Gleeds

Gleeds Gender Pay Gap Report 2019 Gleeds figures 2018 PAY GAP This table shows the mean and median pay gap between men and women, based on hourly rates of pay and presented relative to men’s earnings. The median gender pay gap differs from the mean as it shows the mid-point of data, rather than the average. BONUS GAP

3y ago

165 Views

ECE595 / STAT598: Machine Learning I Lecture 15 Logistic Regression 2

It looks like you're using an ad-blocker