Probabilistic Learning Classification Using Naïve Bayes

2y ago

28 Views

4 Downloads

293.34 KB

11 Pages

Last View : 16d ago

Last Download : 3m ago

Upload by : Abram Andresen

Report this link

Download PDF

Transcription

Probabilistic Learning – Classification using Naïve BayesWeather forecasts are usually provided in terms such as “70percent chance of rain”. These forecasts are known as probabilitiesof precipitation reports. But how are they calculated? It is aninteresting question because in reality, it will either rain or it willnot.These estimates are based on probabilistic methods. Probabilisticmethods are concerned about describing uncertainty. They use dataon past events to extrapolate future events. In the case of weather,the chance of rain describes the proportion of prior (previous) dayswith similar atmospheric conditions in which precipitationoccurred. So a 70% chance of rain means that out of 10 days withsimilar atmospheric patterns, it rained in 7 of them.Naïve Bayes machine learning algorithm uses principles ofprobabilities for classification. Naïve Bayes uses data about priorevents to estimate the probability of future events. For example, acommon application of naïve Bayes uses frequency of words injunk email messages to identify new junk mail. We will learn:- Basic principle of probabilities that are used for naïve Bayes.- Specialized methods, visualizations and data structures used foranalyzing text using R.- How to employ an R implementation of naïve Bayes classifier tobuild an SMS message filter.Understanding naïve Bayes:A probability is a number between 0 and 1 that captures thechances that an event will occur given the available evidence. Aprobability of 0% means that the event will not occur, while aprobability of 100% indicates that the event will certainly occur.

Classifiers based on Bayesian methods utilize training data tocalculate an observed probability of each class based on featurevalues. When the classifier is used later on unlabeled data, it usesthe observed probabilities to predict the most likely class for thenew features.Bayesian classifiers are best applied to problems in which there arenumerous features and they all contribute simultaneously and insome way to the outcome. If a large number of features haverelatively minor effects, taken together, their combined impactcould be large.Basic concepts of Bayesian Methods:Bayesian methods are based on the concept that the estimatedlikelihood of an event should be based on the evidence at hand.Events are possible outcomes, such as sunny or rainy weather orspam or not spam emails. A trial is a single opportunity for theevent to occur, such as today’s weather or an email message.ProbabilityThe probability of an event can be estimated from observed databy dividing the number of trials in which an event occurred by thetotal number of trials. For example if it rained 3 out of 10 days, theprobability of rain can be estimated to 30%. Similarly if 10 out of50 emails are spam, then the probability of spam can be estimatedas 20%. The notation P(A) is used to denote the probability ofevent A, as in P(spam) 0.20The total probability of all possible outcomes of a trial mustalways be 100%. Thus, if the trial has only 2 outcomes that cannotoccur simultaneously, such as rain or shine, spam or not spam, thenknowing the probability of either outcome reveals the probabilityof the other.

When two events are mutually exclusive and exhaustive ( theycannot occur at the same time and are the only two possibleoutcomes) and P(A) q, then P( A) 1-q.Joint Probability:We may be interested in monitoring several non-mutuallyexclusive events for the same trial. If the events occur with theevent of interest, we may be able to use them to make predictions.Consider, for instance, a second event based on the outcome thatthe email message contains the word Viagra. For most people, thisword is only likely to show up in a spam message; its presencewould be strong evidence that the email is spam. The probabilitythat an email contains the word “Viagra” is 5%.We know that 20% of all messages were spam , and 5% of allmessages contained Viagra. We need to quantify the degree ofoverlap between the two proportions, that is we hope to estimatethe probability of both Spam and Viagra occurring , which can bewritten as P(spam Viagra).Calculating P( spam Viagra) depends on the joint probabilityof the two events. If the two events are totally unrelated, they arecalled independent events. On the other hand, dependent eventsare the basis of predictive modeling. For instance, the presence ofclouds is likely to be predictive of a rainy day.If we assume that P(spam) and P(Viagra) are independent, wecould then calculateP(spam Viagra) as the product of probabilities of eachP(spam Viagra) P(spam)*P(Viagra) .2*.05 0.01, or 1% ofall spam messages contain the word Viagra.In reality, it is more likely that P(spam) and P(Viagra) are highlydependent, which means that the above calculation is incorrect.

Conditional probability with Bayes’ theorem:The relationship between dependent events can be described usingBayes’ theorem. The notation P(A B) is read as the probability ofevent A given that event B has occurred. This is known asconditional probability, since the probability of A is dependent(that is, conditional) on what happened with event B.𝑷(𝑩 𝑨)𝑷(𝑨) 𝑷(𝑨 𝑩)𝑷(𝑨 𝑩) 𝑷(𝑩)𝑷(𝑩)To understand a little better how the Bayes’ theorem works,suppose we are tasked with guessing the probability that anincoming email was spam. without any additional evidence, themost reasonable guess would be the probability that any priormessage was spam (20%). This estimate is known as the priorprobability.Now suppose that we obtained an additional piece of evidence, thatis that the incoming message the term Viagra was used. Theprobability that the world Viagra was used in previous spammessages is called the likelihood and the probability that Viagraappeared in any message at all is known as marginal likelihood.By applying Bayes’ theorem to this evidence, we can compute aposterior probability that measures how likely the message is tobe spam. If the posterior probability is more than 50%, themessage is more likely to be spam.𝑷(𝑽𝒊𝒂𝒈𝒓𝒂 𝒔𝒑𝒂𝒎) 𝑷(𝒔𝒑𝒂𝒎)𝑷(𝒔𝒑𝒂𝒎 𝑽𝒊𝒂𝒈𝒓𝒂) 𝑷(𝑽𝒊𝒂𝒈𝒓𝒂)To calculate the components of Bayes’s theorem, we mustconstruct a frequency table that records the number of timesViagra appeared in spam and non-spam messages. The cellsindicate the number of instances having a particular combinationof class value and feature value.

ViagraFrequencyYesNoTotalspam41620non spam17980Total595100The frequency table is used to construct the likelihood table:ViagraFrequencyYesNoTotalspam4/2016/2020non spam1/8079/8080Total5/10095/100100The likelihood table reveals that P(Viagra/spam) 4/20 .20. Thisindicates that probability is 20% that a spam email contains theterm Viagra. Additionally, since the theorem says thatP(B A)*P(A) P(A B), we can calculate P(spam Viagra) asP(Viagra spam)*P(spam) (4/20) *(20/100) 0.04.This is 4 times the probability under independence.To compute the posterior probability P(spam Viagra), we takeP(Viagra spam)*P(spam)/P(Viagra) (4/20)*(20/100)/(5/100) .80.Therefore, the probability is 80% that a message is spam, giventhat it contains the word Viagra.

This is how commercial spam filters work in general, althoughthey consider a much larger number of words simultaneously,when computing the frequency and likelihood tables.The naïve Bayes algorithmThe naïve Bayes(NB) algorithm describes a simple applicationusing Bayes’ theorem for classification. It is the most commonalgorithm, particularly for text classification where it has becomethe standard. Strengths and weaknesses of this algorithm are asfollows:Strengths Simple, fasteffective.Weaknessesandvery Does well with missing ornoisy data Requires relatively fewexamples for training, butworks well with very largenumbers of examples. Easy to obtain theestimated probability for aprediction Assumes that all featuresare equally important andindependent Not ideal for datasets withlarge numbers of numericfeatures Estimated probabilities areless reliable than thepredicted classesThe naïve Bayes algorithm is named as such because it makes acouple of “naïve” assumptions about the data. In particular, NBassumes that all of the features in the dataset are equally importantand independent. These assumptions are often not true.The naïve Bayes classificationLet’s extend our spam filter by adding a few additional terms to bemonitored: money, groceries and unsubscribe. The NB learner is

trained by constructing a likelihood table for the appearance ofthese four words (W1, W2, W3 and W4) as in the W3)(W4)Likelihood Yes NoYesNoYesNoYesNoTotalspam4/20 16/20 10/20 10/20 0/20 20/20 12/20 8/20not spam1/80 79/80 14/80 66/80 8/80 71/80 23/80 57/80 80Total5%95%24%76%8%91%35%65%20100As new messages arrive, the posterior probability must be calculated todetermine whether they are more likely spam or not spam, given thelikelihood of the words found in the message. For example, suppose that amessage contains the terms Viagra and Unsubscribe, but does not containeither Money or Groceries.Using Bayes theorem we can define the problem as shown in the followingformula which captures that a message is spam, given that Viagra yes,Money No, Groceries No and Unsubscribe yes :𝑃(𝑆𝑝𝑎𝑚 𝑊1 𝑊2 𝑊3 𝑊4) 𝑃(𝑊1 𝑊2 𝑊3 𝑊4 𝑆𝑝𝑎𝑚)𝑃(𝑆𝑝𝑎𝑚)𝑃(𝑊1 𝑊2 𝑊3 𝑊4)This formula is computationally difficult to solve. As additional features areadded, large amounts of memory are needed to store the probabilities of allpossible intersection events.However, this becomes easier with the assumption that events areindependent. Specifically, NB assumes that events are independent so longas they are related to the same class values. Assuming conditionalindependence allows us to simplify the formula using the probability rule forindependent events P(A B) P(A) * P(B).So our formula becomes:

𝑃(𝑆𝑝𝑎𝑚 𝑊1 𝑊2 𝑊3 𝑊4)𝑃(𝑊1 𝑆𝑝𝑎𝑚)𝑃( 𝑊2 𝑆𝑝𝑎𝑚)𝑃( 𝑊3 𝑆𝑝𝑎𝑚)𝑃(𝑊4 𝑆𝑝𝑎𝑚)𝑃(𝑆𝑝𝑎𝑚) 𝑃(𝑊1)𝑃( 𝑊2)𝑃( 𝑊3)𝑃(𝑊4)The result of this formula is then compared to the probability that themessage is not spam:(𝑁𝑜𝑆𝑝𝑎𝑚 𝑊1 𝑊2 𝑊3 𝑊4)𝑃(𝑊1 𝑁𝑜𝑡𝑆𝑝𝑎𝑚)𝑃( 𝑊2 𝑁𝑜𝑡𝑆𝑝𝑎𝑚)𝑃( 𝑊3 𝑁𝑜𝑡𝑆𝑝𝑎𝑚)𝑃(𝑊4 𝑝 𝑃(𝑊1)𝑃( 𝑊2)𝑃( 𝑊3)𝑃(𝑊4)Using the values in the likelihood table, we can start filling the numbers inthese equations. Because the denominator is the same, we will ignore it fornow.The overall likelihood of spam is then(4/20)*(10/20)*(20/20)*(12/20)*(20/100) 0.012.While the likelihood of non spam given this pattern of words is:(1/80)*(66/80)*(71/80)*(23/80)*(80/100) 0.002Since 0.012/0.002 6, this says that an email with this pattern of words is 6times more likely to be spam than non spam.To convert these numbers to probabilities, we apply the formula 0.012/(0.012 0.002) 0.857 85.7%The probability that the message is spam is equal to the likelihood that themessage is spam divided by the sum of likelihoods that the message is eitherspam or non spam.Similarly, the probability of non spam is : 0.002/(0.012 0.002) 0.143Given the pattern of words in the message, we expect that the message isspam with 85.7% probability and non-spam with 14.3% probability.

The naïve Bayes classification algorithm used can be summarized by thefollowing formula. The probability of level L for class C, given theevidence provided by features F1, F2, , Fn, is equal to the product ofprobabilities of each piece of evidence conditioned on the class level, theprior probabilities of the class level and a scaling factor 1/Z which convertsthe result to a probability:𝑛1𝑃(𝐶𝐿 𝐹1 , 𝐹2 , , 𝐹𝑛 ) 𝑝(𝐶𝐿 ) 𝑝(𝐹𝑖 𝐶𝐿 )𝑍𝑖 1The Laplace Estimator:Let us look at one more example. Suppose we received another message,this time containing the terms: Viagra, Groceries, Money and Unsubscribe.Using the naïve Bayes algorithm, as before, we can compute the likelihoodof spam as:(4/20)*(10/20)*(0/20)*(12/20)*(20/100) 0And the likelihood of non-spam as:(1/80)*(14/80)*(8/80)*(23/80)*(80/100) 0.00005Therefore the probability of spam 0/(0 0.00005) 0And the probability of non spam 1This result suggests that the message is spam with 0% probability and nonspam with 100% probability. This prediction probably does not make sense,since it includes words that are very rarely used in legitimate messages. It istherefore likely that the classification is not correct.This problem might arise if an event never occurs for one or more levels ofthe class in the training set. For example, the term Groceries had neverpreviously appeared in a spam message. Consequently P(spam groceries) 0Because probabilities in NB are multiplies, this 0 value causes the posteriorprobability of spam to be 0, giving a word the ability to nullify and overruleall of the other evidence.A solution to this problem involves using the Laplace estimator. TheLaplace estimator adds a small number to each of the counts in thefrequency table, which ensures that each feature has a non zero probability

of occurring with each class. Typically, the estimator is set to 1, whichensures that every feature has a non-zero probability.Let us see how this affects our prediction for this message. Using a Laplacevalue of 1, we add 1 to each numerator in the likelihood function. The totalnumber of 1’s must also be added to each denominator. The likelihood ofspam becomes:(4/20)*(10/20)*(0/20)*(12/20)*(20/100) 0(5/24)*(11/24)*(1/24)*(13/24)*(20/100) 0.0004And the likelihood of non spam is:(2/84)*(15/84)*(9/84)*(24/84)*(80/100) 0.0001Probability of spam 0.0004/0.0005 .8 80%Probability of non spam 20%Using numeric features with naïve BayesBecause naïve Bayes uses frequency tables for learning the data, eachfeature must be categorical in order to create the combinations of class andfeature values. Since numeric features do not have categories of values, theNB algorithm would not work without modification.One easy and effective solution is to discretize a numeric feature, whichmeans that the numbers are put in categories known as bins. For this reason,discretization is often called binning.There are several different ways of binning a numeric value. The mostcommon is to explore the data for natural categories or cut points in thedistribution. For example, suppose you added a feature to the spam datasetthat recorded the time (on a 24 hours clock) the email was sent.We might want to divide the day into 4 bins of 6 hours based on the fact thatin the early hours of morning, messages frequency is low. Activity picks upduring business hours, and tapers off in the evening. This seems to validatethe choice of 4 natural bins of activity. Each email will have a feature statingwhich bin the email belongs to.Note if there are no obvious cut points, one option is to discretize the featureusing quantiles.

Practice exercises on Naïve BayesExercise 1:Using the data above find the probability that an email is spam or not if it has:- “groceries”, “Money” and “unsubscribe” and not containing “Viagra”-“Viagra” and “groceries”, but not “money” or “unsubscribe”Exercise 2:A retail store carries a product that is supplied by three manufactures, A, B, and C, and30% from A, 20% from B and 50% from C.It is known that 2% of the products from A are defective, 3% from B are defective, and5% from C are defective.A) If a product is randomly selected from this store, what is the probability that it isdefective?B) If a defective product is found what is the probability that it was from B?

spam 4/20 16/20 20 non spam 1/80 79/80 80 Total 5/100 95/100 100 The likelihood table reveals that P(Viagra/spam) 4/20 .20. This indicates that probability is 20% that a spam email contains the term Viagra. Additio

Related Documents:

Probabilistic Proof Systems: A Primer

deterministic polynomial-time algorithms. However, as argued next, we can gain a lot if we are willing to take a somewhat non-traditional step and allow probabilistic veriﬂcation procedures. In this primer, we shall survey three types of probabilistic proof systems, called interactive proofs, zero-knowledge proofs, and probabilistic checkable .

27 Views

3y ago

Tutorial on Probabilistic Topic Modeling: Additive Regularization for ...

non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models. Keywords: Probabilistic topic modeling · Regularization of ill-posed inverse problems · Stochastic matrix factorization · Probabilistic .

8 Views

1y ago

Management of Probabilistic Data: Foundations and Challenges

A Model for Uncertainties Data is probabilistic Queries formulated in a standard language Answers are annotated with probabilities This talk: Probabilistic Databases 9. 10 Probabilistic databases: Long History Cavallo&Pitarelli:1987 Barbara,Garcia-Molina, Porter:1992 Lakshmanan,Leone,Ross&Subrahmanian:1997

8 Views

1y ago

CLASSIFICATION OF CRUDE DRUGS

classification has its own merits and demerits, but for the purpose of study the drugs are classified in the following different ways: Alphabetical classification Morphological classification Taxonomical classification Pharmacological classification Chemical classification

31 Views

2y ago

ICML Tutorial - Carnegie Mellon School of Computer Science

What is this tutorial NOT about? Classification methods Kernel methodsKernel methods Discriminative models – Linear Discriminant Analysis (LDA) – Canonical Correlation Analysis (CCA) Probabilistic latent variable models – Probabilistic PCA – Probabilistic latent semantic indexin

15 Views

2y ago

ECE 175B Probabilistic Reasoning & Graphical Models

Machine Learning: A Probabilistic Perspective Kevin Murphy, MIT Press, 2012 Probabilistic Graphical Models Daphne Koller & Nir Friedman, MIT Press, 2009 Supplemental Texts Pattern Recognition & Machine Learning, C.M. Bishop, Springer, 2007. Especially Chapter 8 Artificial Intellige

12 Views

2y ago

Research Article Tunnel Probabilistic Structural Analysis Using the FORM

study on the probabilistic analysis of structural instability in underground openings. er efore, in this research a tunnel probabilistic structural stability analysis (TuPSA) was developed using rst order reliability method. For this purpose, at r st, the reliability problem and rst order reliability method (FORM) are explained, and

8 Views

1y ago

Probabilistic forecasts using expert judgement: the road to ... - Monash

Probabilistic forecasts using expert judgement: the road to recovery from COVID-19 To the best of our knowledge this paper is the ﬁrst to generate probabilistic scenario-based judgemental forecasts. We use a large survey from diverse experts and stakeholders, propos-ing a novel methodology to produce forecasts.

7 Views

1y ago

Recent Views

Pegasus - University of Exeter Blogs

'Pegasus', Dept. of Classics and Ancient History, Amory Building, Rennes Drive, University of Exeter, Exeter EX4 4RJ E-mail: pegasus@exeter.ac.uk . Pegasus - 2 - Issue 52 (2009) s in . The major event this last year was the announcement of the outcome of the Research Assessment Exercise 2008 in .

11m ago

72 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Mathematics 2 Problem Sets - Exeter

Mathematics 2 Mathematics Department Phillips Exeter Academy Exeter, NH August 2019. To the Student Contents: Members of the PEA Mathematics Department have written the material in this book. As you work through it, you will discover that algebra, geometry, and trigonometry have been integr

2y ago

119 Views

The Seafarer RL 4 The Wanderer The Wife’s Lament

survive in the Exeter Book, a manuscript of Anglo-Saxon poems produced by a single scribe around a.d. 950. In addition to these and other secular poems, the Exeter Book contains religious verse, nearly 100 riddles, and a heroic narrative. It is the largest collection of Old English poetry in existence. Neglected Treasure Originally, the Exeter

2y ago

139 Views

UPPER SCHOOL - Exeter

Club, Exeter Soccer Club, or Exeter Volleyball Club. UPPER SCHOOL students may also partici-pate in musical or choral groups. DESIGNING YOUR OWN CURRICULUM As an UPPER SCHOOL student, you have the free-dom to design your own academic curriculum. You may enroll in any three of the more than

2y ago

341 Views

Exeter a gentleman in Moscow weekend

“A Gentleman in Moscow” Itinerary at a Glance: Day 1 Arrive in Moscow Day 2 Backstage Tour of the Bolshoi Day 3 Kremlin Tour Day 4 Depart Moscow . Why Exeter International? Our Knowledge & Experience . At Exeter International we have been creating mem

2y ago

328 Views

Phillips Exeter Academy Courses of Instruction 2022-23

Academic Excellence Academic excellence is a signature strength of Phillips Exeter Academy. In every discipline and . — academic, artistic, athletic and extracurricular — . and passions and the agency needed to carry these forward. Non Sibi Non Sibi, or Not For Oneself, inscribed on Exeter’s

1y ago

114 Views

Exeter College Association

club in Takoradi, Ghana. Katrina Hancock read Earth Sciences at Exeter between 1998 and 2002. She joined the Development Office in 2004 and has been Director of Development since 2006. Mark Houghton-Berry,Honorary Fellow, read Literae Humaniores at Exeter between 1976 and 1980. He is CEO of Tudor Capital LP, the European arm of a US based hedge .

1y ago

95 Views

Exeter Plan: Outline Draft Plan Consultation Sustainability Appraisal .

1. Draft Non-Technical Summary of the SA Report for the Exeter Plan (Outline Draft Consultation) M. Andrew B. Miller S. Temple K. Nicholls K. Nicholls 10.08.2022 2. Final Non-Technical Summary of the SA Report for the Exeter Plan (Outline Draft Consultation) M. Andrew B. Miller S. Temple K. Nicholls K. Nicholls 09.09.2022 3.

1y ago

108 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Probabilistic Learning Classification Using Naïve Bayes

It looks like you're using an ad-blocker