Lecture 3 - Gaussian Mixture Models And Introduction To

2y ago
41 Views
1 Downloads
2.02 MB
106 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Ronan Orellana
Transcription

Lecture 3Gaussian Mixture Models and Introduction to HMM’sMichael Picheny, Bhuvana Ramabhadran, Stanley F. Chen,Markus Nussbaum-ThomWatson GroupIBM T.J. Watson Research CenterYorktown Heights, New York, USA{picheny,bhuvana,stanchen,nussbaum}@us.ibm.com3 February 2016

AdministriviaLab 1 due Friday, February 12th at 6pm.Should have received username and password.Courseworks discussion has been started.TA office hours: Minda, Wed 2-4pm, EE lounge, 13thfloor Mudd; Srihari, Thu 2-4pm, 122 Mudd.2 / 106

FeedbackMuddiest topic: MFCC(7), DTW(5), DSP (4), PLP (2)Comments (2 votes):More examples/spend more time on examples (8)Want slides before class (4)More explanation of equations (3)Engage students more; ask more questions to class (2)3 / 106

Where Are We?Can extract features over time (MFCC, PLP, others) that . . .Characterize info in speech signal in compact form.Vector of 12-40 features extracted 100 times a second4 / 106

DTW RecapTraining: record audio Aw for each word w in vocab.Generate sequence of MFCC features A0w (templatefor w).Test time: record audio Atest , generate sequence of MFCCfeatures A0test .For each w, compute distance(A0test , A0w ) using DTW.Return w with smallest distance.DTW computes distance between words represented assequences of feature vectors . . .While accounting for nonlinear time alignment.Learned basic concepts (e.g., distances, shortest paths) . . .That will reappear throughout course.5 / 106

What are Pros and Cons of DTW?6 / 106

ProsEasy to implement.7 / 106

Cons: It’s Ad HocDistance measures completely heuristic.Why Euclidean?Weight all dimensions of feature vector equally? - ugh !!Warping paths heuristic.Human derived constraints on warping paths likeweights, etc - ugh!!Doesn’t scale wellRun DTW for each template in training data - what iflarge vocabulary? -ugh!!Plenty other issues.8 / 106

Can We Do Better?Key insight 1: Learn as much as possible from data.e.g., distance measure; warping functions?Key insight 2: Use probabilistic modeling.Use well-described theories and models from . . .Probability, statistics, and computer science . . .Rather than arbitrary heuristics with ill-definedproperties.9 / 106

Next Two Main TopicsGaussian Mixture models (today) — A probabilistic modelof . . .Feature vectors associated with a speech sound.Principled distance between test frame . . .And set of template frames.Hidden Markov models (next week) — A probabilistic modelof . . .Time evolution of feature vectors for a speech sound.Principled generalization of DTW.10 / 106

Part IGaussian Distributions11 / 106

GaussJohann Carl Friedrich Gauss (1777-1855)"Greatest Mathematician since Antiquity"12 / 106

Gauss’s Dog13 / 106

The ScenarioCompute distance between test frame and frame oftemplateImagine 2d feature vectors instead of 40d for visualization.14 / 106

Problem FormulationWhat if instead of one training sample, have many?15 / 106

IdeasAverage training samples; compute Euclidean distance.Find best match over all training samples.Make probabilistic model of training samples.16 / 106

Where Are We?1Gaussians in One Dimension2Gaussians in Multiple Dimensions3Estimating Gaussians From Data17 / 106

Problem Formulation, Two DimensionsEstimate P(x1 , x2 ), the “frequency” . . .That training sample occurs at location (x1 , x2 ).18 / 106

Let’s Start With One DimensionEstimate P(x), the “frequency” . . .That training sample occurs at location x.19 / 106

The Gaussian or Normal Distribution2Pµ,σ2 (x) N (µ, σ ) 12πσe (x µ)22σ 2Parametric distribution with two parameters:µ mean (the center of the data).σ 2 variance (how wide data is spread).20 / 106

VisualizationDensity function:µ 4σµ 2σµµ 2σµ 4σµµ 2σµ 4σSample from distribution:µ 4σµ 2σ21 / 106

Properties of Gaussian DistributionsIs valid probability distribution.Z (x µ)21 e 2σ2 dx 12πσ Central Limit Theorem: Sums of large numbers ofidentically distributed random variables tend to Gaussian.Lots of different types of data look “bell-shaped”.Sums and differences of Gaussian random variables . . .Are Gaussian.If X is distributed as N (µ, σ 2 ) . . .aX b is distributed as N (aµ b, (aσ)2 ).Negative log looks like weighted Euclidean distance! ln2πσ (x µ)22σ 222 / 106

Where Are We?1Gaussians in One Dimension2Gaussians in Multiple Dimensions3Estimating Gaussians From Data23 / 106

Gaussians in Two DimensionsN (µ1 , µ2 , σ12 , σ22 ) 1 2πσ1 σ2 1 r 2e 12(1 r 2 )„2rx x(x µ )2(x1 µ1 )2 σ 1σ 2 2 2 2σ2σ1 212«If r 0, simplifies to 12πσ1e (x1 µ1 )22σ 211 e2πσ2 (x2 µ2 )22σ 22 N (µ1 , σ12 )N (µ2 , σ22 )i.e., like generating each dimension independently.24 / 106

Example: r 0, σ1 σ2x1 , x2 uncorrelated.Knowing x1 tells you nothing about x2 .25 / 106

Example: r 0, σ1 6 σ2x1 , x2 can be uncorrelated and have unequal variance.26 / 106

Example: r 0, σ1 6 σ2x1 , x2 correlated.Knowing x1 tells you something about x2 .27 / 106

Generalizing to More DimensionsIf we write following matrix:Σ σ12r σ1 σ2r σ1 σ2σ22then another way to write two-dimensional Gaussian is:N (µ, Σ) 1(2π)d/2 Σ 1/21T Σ 1 (x µ)e 2 (x µ)where x (x1 , x2 ), µ (µ1 , µ2 ).More generally, µ and Σ can have arbitrary numbers ofcomponents.Multivariate Gaussians.28 / 106

Diagonal and Full Covariance GaussiansLet’s say have 40d feature vector.How many parameters in covariance matrix Σ?The more parameters, . . .The more data you need to estimate them.In ASR, usually assume Σ is diagonal d params.This is why like having uncorrelated features!29 / 106

Computing Gaussian Log LikelihoodsWhy log likelihoods?Full covariance:log P(x) d11ln(2π) ln Σ (x µ)T Σ 1 (x µ)222Diagonal covariance:ddi 1i 1X1Xd(xi µi )2 /σi2ln σi log P(x) ln(2π) 22Again, note similarity to weighted Euclidean distance.Terms on left independent of x; precompute.A few multiplies/adds per dimension.30 / 106

Where Are We?1Gaussians in One Dimension2Gaussians in Multiple Dimensions3Estimating Gaussians From Data31 / 106

Estimating GaussiansGive training data, how to choose parameters µ, Σ?Find parameters so that resulting distribution . . .“Matches” data as well as possible.Sample data: height, weight of baseball players.300260220180140667074788232 / 106

Maximum-Likelihood Estimation (Univariate)One criterion: data “matches” distribution well . . .If distribution assigns high likelihood to data.Likelihood of string of observations x1 , x2 , . . . , xN is . . .Product of individual likelihoods.NY(xi µ)21N L(x1 µ, σ) e 2σ22πσi 1Maximum likelihood estimation: choose µ, σ . . .That maximizes likelihood of training data.(µ, σ)MLE arg max L(x1N µ, σ)µ,σ33 / 106

Why Maximum-Likelihood Estimation?Assume we have “correct” model form.Then, as the number of training samples increases . . .ML estimates approach “true” parameter values(consistent)ML estimators are the best! (efficient)ML estimation is easy for many types of models.Count and normalize!34 / 106

What is ML Estimate for Gaussians?Much easier to work with log likelihood L ln L:NL(x1N µ, σ)N1 X (xi µ)2 ln 2πσ 2 22σ2i 1Take partial derivatives w.r.t. µ, σ:NX L(x1N µ, σ)(xi µ) µσ2i 1NX (xi µ)2 L(x1N µ, σ)N σ 22σ 2σ4i 12Set equal to zero; solve for µ, σ .N1Xµ xiNi 1N1Xσ (xi µ)2N2i 135 / 106

What is ML Estimate for Gaussians?Multivariate case.N1Xµ xiNi 1N1XΣ (xi µ)T (xi µ)Ni 1What if diagonal covariance?Estimate params for each dimension independently.36 / 106

Example: ML EstimationHeights (in.) and weights (lb.) of 1033 pro baseball players.Noise added to hide discretization effects. stanchen/e6870/data/mlb 7 / 106

Example: ML Estimation300260220180140667074788238 / 106

Example: Diagonal Covarianceµ1 1(74.34 73.92 72.01 · · · ) 73.711033µ2 1(181.29 213.79 209.52 · · · ) 201.691033σ12 1 (74.34 73.71)2 (73.92 73.71)2 · · · )1033 5.43σ22 1 (181.29 201.69)2 (213.79 201.69)2 · · · )1033 440.6239 / 106

Example: Diagonal Covariance300260220180140667074788240 / 106

Example: Full CovarianceMean; diagonal elements of covariance matrix the same.Σ12 Σ21 1[(74.34 73.71) (181.29 201.69) 1033(73.92 73.71) (213.79 201.69) · · · )] 25.43µ [ 73.71 201.69 ]Σ 5.43 25.4325.43 440.6241 / 106

Example: Full Covariance300260220180140667074788242 / 106

Recap: GaussiansLots of data “looks” Gaussian.Central limit theorem.ML estimation of Gaussians is easy.Count and normalize.In ASR, mostly use diagonal covariance Gaussians.Full covariance matrices have too many parameters.43 / 106

Part IIGaussian Mixture Models44 / 106

Problems with Gaussian Assumption45 / 106

Problems with Gaussian AssumptionSample from MLE Gaussian trained on data on last slide.Not all data is Gaussian!46 / 106

Problems with Gaussian AssumptionWhat can we do? What about two Gaussians?P(x) p1 N (µ1 , Σ1 ) p2 N (µ2 , Σ2 )where p1 p2 1.47 / 106

Gaussian Mixture Models (GMM’s)More generally, can use arbitrary number of Gaussians:P(x) Xjpj1(2π)d/2 Σj 1/21T Σ 1 (x µ )jje 2 (x µj )Pwhere j pj 1 and all pj 0.Also called mixture of Gaussians.Can approximate any distribution of interest pretty well . . .If just use enough component Gaussians.48 / 106

Example: Some Real Acoustic Data49 / 106

Example: 10-component GMM (Sample)50 / 106

Example: 10-component GMM (µ’s, σ’s)51 / 106

ML Estimation For GMM’sGiven training data, how to estimate parameters . . .i.e., the µj , Σj , and mixture weights pj . . .To maximize likelihood of data?No closed-form solution.Can’t just count and normalize.Instead, must use an optimization technique . . .To find good local optimum in likelihood.Gradient searchNewton’s methodTool of choice: The Expectation-Maximization Algorithm.52 / 106

Where Are We?1The Expectation-Maximization Algorithm2Applying the EM Algorithm to GMM’s53 / 106

Wake Up!This is another key thing to remember from course.Used to train GMM’s, HMM’s, and lots of other things.Key paper in 1977 by Dempster, Laird, and Rubin [2];43958 citations to date."the innovative Dempster-Laird-Rubin paper in the Journal ofthe Royal Statistical Society received an enthusiastic discussionat the Royal Statistical Society meeting.calling the paper"brilliant""54 / 106

What Does The EM Algorithm Do?Finds ML parameter estimates for models . . .With hidden variables.Iterative hill-climbing method.Adjusts parameter estimates in each iteration . . .Such that likelihood of data . . .Increases (weakly) with each iteration.Actually, finds local optimum for parameters in likelihood.55 / 106

What is a Hidden Variable?A random variable that isn’t observed.Example: in GMMs, output prob depends on . . .The mixture component that generated the observationBut you can’t observe itImportant concept. Let’s discuss!!!!56 / 106

Mixtures and Hidden VariablesSo, to compute prob of observed x, need to sum over . . .All possible values of hidden variable h:XXP(x) P(h, x) P(h)P(x h)hhConsider probability distribution that is a mixture ofGaussians:XP(x) pj N (µj , Σj )jCan be viewed as hidden model.h Which component generated sample.P(h) pj ; P(x h) N (µj , Σj ).XP(x) P(h)P(x h)h57 / 106

The Basic IdeaIf nail down “hidden” value for each xi , . . .Model is no longer hidden!e.g., data partitioned among GMM components.So for each data point xi , assign single hidden value hi .Take hi arg maxh P(h)P(xi h).e.g., identify GMM component generating each point.Easy to train parameters in non-hidden models.Update parameters in P(h), P(x h).e.g., count and normalize to get MLE for µj , Σj , pj .Repeat!58 / 106

The Basic IdeaHard decision:For each xi , assign single hi arg maxh P(h, xi ) . . .With count 1.Test: what is P(h, xi ) for Gaussian distribution?Soft decision:For each xi , compute for every h . . .i)the Posterior prob P̃(h xi ) PP(h,x.h P(h,xi )Also called the “fractional count”e.g., partition event across every GMM component.Rest of algorithm unchanged.59 / 106

The Basic Idea, using more FormalTerminologyInitialize parameter values somehow.For each iteration . . .Expectation step: compute posterior (count) of h for each xi .P(h, xi )P̃(h xi ) Ph P(h, xi )Maximization step: update parameters.Instead of data xi with hidden h, pretend . . .Non-hidden data where . . .(Fractional) count of each (h, xi ) is P̃(h xi ).60 / 106

Example: Training a 2-component GMMTwo-component univariate GMM; 10 data points.The data: x1 , . . . , x108.4, 7.6, 4.2, 2.6, 5.1, 4.0, 7.8, 3.0, 4.8, 5.8Initial parameter values:p1 µ10.5 4σ121p2 µ20.5 7σ221Training data; densities of initial Gaussians.61 / 106

The E Stepxi8.47.64.22.65.14.07.83.04.85.8p1 · 0.14480.0395p2 · 0.01770.0971P(xi ) P̃(1 xi )0.0749 0.0000.1669 0.0020.1995 0.9800.0749 1.0000.1417 0.7690.2017 0.9890.1450 0.0010.1211 0.9990.1626 0.8910.1366 0.289P(h, xi )ph · NhP̃(h xi ) P P(xi )h P(h, xi )P̃(2 xi 1h {1, 2}62 / 106

The M StepView: have non-hidden corpus for each component GMM.For hth component, have P̃(h xi ) counts for event xi .Estimating µ: fractional events.N1Xµ xiNi 1µ1 µh P1iP̃(h xi )NXP̃(h xi )xii 11 0.000 0.002 0.980 · · ·(0.000 8.4 0.002 7.6 0.980 4.2 · · · ) 3.98Similarly, can estimate σh2 with fractional events.63 / 106

The M Step (cont’d)What about the mixture weights ph ?To find MLE, count and normalize!p1 0.000 0.002 0.980 · · · 0.591064 / 106

The End 0.1265 / 106

First Few Iterations of EMiter 0iter 1iter 266 / 106

Later Iterations of EMiter 2iter 3iter 1067 / 106

Why the EM Algorithm Worksx (x1 , x2 , . . .) whole training set; h hidden.θ parameters of model.Objective function for MLE: (log) likelihood.L(θ) log Pθ (x) log Pθ (x, h) log Pθ (h x)Form expectation with respect to θ n , the estimate of θ onthe nth estimation iteration:XPθn (h x) log Pθ (x) X XhPθn (h x) log Pθ (x, h)hPθn (h x) log Pθ (h x)hrewrite as : log Pθ (x) Q(θ θ n ) H(θ θ n )68 / 106

Why the EM Algorithm Workslog Pθ (x) Q(θ θ n ) H(θ θ n )What is Q? In the Gaussian example above Q is justXPθn (h x) log ph Nx (µh , Σh )hIt can be shown (using Gibb’s inequality) thatH(θ θ n ) H(θ n θ n ) for any θ 6 θ nSo that means that any choice of θ that increases Q willincrease log Pθ (x). Typically we just pick θ to maximize Qaltogether, can often be done in closed form.69 / 106

The E StepCompute Q.70 / 106

The M StepMaximize Q with respect to θThen repeat - E/M, E/M till likelihood stops improvingsignificantly.That’s the E-M algorithm in a nutshell!71 / 106

DiscussionEM algorithm is elegant and general way to . . .Train parameters in hidden models . . .To optimize likelihood.Only finds local optimum.Seeding is of paramount importance.72 / 106

Where Are We?1The Expectation-Maximization Algorithm2Applying the EM Algorithm to GMM’s73 / 106

Another Example Data Set74 / 106

Question: How Many Gaussians?Method 1 (most common): Guess!Method 2: Bayesian Information Criterion (BIC)[1].Penalize likelihood by number of parameters.kX11BIC(Ck ) { nj log Σj } Nk (d d(d 1))22j 1k Gaussian components.d dimension of feature vector.nj data points for Gaussian j; N total data points.Discuss!75 / 106

The Bayesian Information CriterionView GMM as way of coding data for transmission.Cost of transmitting model number of params.Cost of transmitting data log likelihood of data.Choose number of Gaussians to minimize cost.76 / 106

Question: How To Initialize Parameters?Set mixture weights pj to 1/k (for k Gaussians).Pick N data points at random and . . .Use them to seed initial values of µj .Set all σ’s to arbitrary value . . .Or to global variance of data.Extension: generate multiple starting points.Pick one with highest likelihood.77 / 106

Another Way: SplittingStart with single Gaussian, MLE.Repeat until hit desired number of Gaussians:Double number of Gaussians by perturbing means . . .Of existing Gaussians by .Run several iterations of EM.78 / 106

Question: How Long To Train?i.e., how many iterations of EM?Guess.Look at performance on training data.Stop when change in log likelihood per event . . .Is below fixed threshold.Look at performance on held-out data.Stop when performance no longer improves.79 / 106

The Data Set80 / 106

Sample From Best 1-Component GMM81 / 106

The Data Set, Again82 / 106

20-Component GMM Trained on Data83 / 106

20-Component GMM µ’s, σ’s84 / 106

Acoustic Feature Data Set85 / 106

5-Component GMM; Starting Point A86 / 106

5-Component GMM; Starting Point B87 / 106

5-Component GMM; Starting Point C88 / 106

Solutions With Infinite LikelihoodConsider log likelihood; two-component 1d Gaussian.!N(x µ )2(x µ )2X i 21 i 2211ln p1 e 2σ1 p2 e 2σ22πσ12πσ2i 1If µ1 x1 , above reduces toln11 e2 2πσ1 2 2πσ221 (x1 µ2 )2σ22! NX.i 2which goes to as σ1 0.Only consider finite local maxima of likelihood function.Variance flooring.Throw away Gaussians with “count” below threshold.89 / 106

RecapGMM’s are effective for modeling arbitrary distributions.State-of-the-art in ASR for decades (though may besuperseded by NNs at some point, discuss later incourse)The EM algorithm is primary tool for training GMM’s (andlots of other things)Very sensitive to starting point.Initializing GMM’s is an art.90 / 106

ReferencesS. Chen and P.S. Gopalakrishnan, “Clustering via theBayesian Information Criterion with Applications in SpeechRecognition”, ICASSP, vol. 2, pp. 645–648, 1998.A.P. Dempster, N.M. Laird, D.B. Rubin, “Maximum Likelihoodfrom Incomplete Data via the EM Algorithm”, Journal of theRoyal Stat. Society. Series B, vol. 39, no. 1, 1977.91 / 106

What’s Next: Hidden Markov ModelsReplace DTW with probabilistic counterpart.Together, GMM’s and HMM’s comprise . . .Unified probabilistic framework.Old paradigm:w arg min distance(A0test , A0w )w vocabNew paradigm:w arg max P(A0test w)w vocab92 / 106

Part IIIIntroduction to Hidden Markov Models93 / 106

Introduction to Hidden Markov ModelsThe issue of weights in DTW.Interpretation of DTW grid as Directed Graph.Adding Transition and Output Probabilities to the Graphgives us an HMM!The three main HMM operations.94 / 106

Another Issue with Dynamic Time WarpingWeights are completely heuristic!Maybe we can learn weights from data?Take many utterances . . .95 / 106

Learning Weights From DataFor each node in DP path, count number of times move up right and diagonally %.Normalize number of times each direction taken by totalnumber of times node was actually visited ( C/N)Take some constant times reciprocal as weight (αN/C)Example: particular node visited 100 times.Move % 40 times; 20 times; 40 times.Set weights to 2.5, 5, and 2.5, (or 1, 2, and 1).Point: weight distribution should reflect . . .Which directions are taken more frequently at a node.Weight estimation not addressed in DTW . . .But central part of Hidden Markov models.96 / 106

DTW and Directed GraphsTake following Dynamic Time Warping setup:Let’s look at representation of this as directed graph:97 / 106

DTW and Directed GraphsAnother common DTW structure:As a directed graph:Can represent even more complex DTW structures . . .Resultant directed graphs can get quite bizarre.98 / 106

Path ProbabilitiesLet’s assign probabilities to transitions in directed graph:aij is transitionprobability going from state i to state j,Pwhere j aij 1.Can compute probability P of individual path just usingtransition probabilities aij .99 / 106

Path ProbabilitiesIt is common to reorient typical DTW pictures:Above only describes path probabilities associated withtransitions.Also need to include likelihoods associated withobservations.100 / 106

Path ProbabilitiesAs in GMM discussion, let us define likelihood of producingobservation xi from state j asbj (xi ) Xmcjm1(2π)d/2 Σjm 1/21e 2 (xi µjm )T Σ 1 (x µ )ijmjmwhere cjm are mixture weights associated with state j.This state likelihood is also called the output probabilityassociated with state.101 / 106

Path ProbabilitiesIn this case, likelihood of entire path can be written as:102 / 106

Hidden Markov ModelsThe output and transition probabilities define a HiddenMarkov Model or HMM.Since probabilities of moving from state to state onlydepend on current and previous state, model is Markov.Since only see observations and have to infer statesafter the fact, model is hidden.One may consider HMM to be generative model of speech.Starting at upper left corner of trellis, generateobservations according to permissible transitions andoutput probabilities.Not only can compute likelihood of single path . . .Can compute overall likelihood of observation string . . .As sum over all paths in trellis.103 / 106

HMM: The Three Main TasksCompute likelihood of generating string of observationsfrom HMM (Forward algorithm).Compute best path from HMM (Viterbi algorithm).Learn parameters (output and transition probabilities) ofHMM from data (Baum-Welch a.k.a. Forward-Backwardalgorithm).104 / 106

Part IVEpilogue105 / 106

Course Feedback12Was this lecture mostly clear or unclear? What was themuddiest topic?Other feedback (pace, content, atmosphere)?106 / 106

Properties of Gaussian Distributions Is valid probability distribution. Z 1 1 1 p 2ˇ e (x )2 2 2 dx 1 Central Limit Theorem: Sums of large numbers of identically distributed random variables tend to Gaussian. Lots of different type

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Gaussian filters might not preserve image brightness. 5/25/2010 9 Gaussian Filtering examples Is the kernel a 1D Gaussian kernel?Is the kernel 1 6 1 a 1D Gaussian kernel? Give a suitable integer-value 5 by 5 convolution mask that approximates a Gaussian function with a σof 1.4. .

Outline of the talk A bridge between probability theory, matrix analysis, and quantum optics. Summary of results. Properties of log-det conditional mutual information. Gaussian states in a nutshell. Main result: the Rényi-2 Gaussian squashed entanglement coincides with the Rényi-2 Gaussian entanglement of formation for Gaussian states. .

(b) Probability density for heights of women (red), heights of men (blue), and all heights (black, solid) Figure 1: Two Gaussian mixture models: the component densities (which are Gaussian) are shown in dotted red and blue lines, while the overall density (which is not) is shown as a sol

Machine Learning, Murphy, 2012. MLE for mixtures is difficult Reason 1: The algebraic form is more complex The mixture log likelihood cannot be simplified . Machine Learning: A probabilistic perspective, Murphy, 2012. EM algorithm for Gaussian mixture models Expectation step

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

performed with the help of a 2D Gaussian filter. Noise in an image may lead to un -intended edges, so de -noising is performed. The Gaussian smooth filter is one of the best filter for removing noise drawn from normal distribution. Gaussian filters are a class of linear smoothing filters with the weight chosen according to the shape of a Gaussian

The American Petroleum Institute Manual of Petroleum Measurement Standards (API MPMS) Chapter 19 details equations for estimating the average annual evaporation loss from storage tanks. These equations are based on test tank and field tank data and have been revised since initial publication for more accurate estimations. WHAT IS EVAPORATION? Evaporation is when a substance changes from the .