Speech Enhancement By MAP Spectral Amplitude Estimation Using A Super .

1y ago

3 Views

1 Downloads

1.38 MB

26 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Shaun Edmunds

Report this link

Download PDF

Transcription

EURASIP Journal on Applied Signal Processing 2005:7, 1110–1126c 2005 T. Lotter and P. Vary Speech Enhancement by MAP Spectral AmplitudeEstimation Using a Super-Gaussian Speech ModelThomas LotterInstitute of Communication Systems and Data Processing, RWTH Aachen University of Technology, RWTH Aachen,52056 Aachen, GermanySiemens Audiological Engineering Group, Gebbertstrasse 125, 91058 Erlangen, GermanyEmail: thomas.tl.lotter@siemens.comPeter VaryInstitute of Communication Systems and Data Processing, RWTH Aachen University of Technology, RWTH Aachen,52056 Aachen, GermanyEmail: vary@ind.rwth-aachen.deReceived 7 June 2004; Revised 17 September 2004; Recommended for Publication by Jacob BenestyThis contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximuma posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density functionof the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy forLaplace- or Gamma-distributed real and imaginary parts of the speech DFT coeﬃcients. Also, the statistical model can be adaptedto optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the superGaussian statistical model, computationally eﬃcient maximum a posteriori speech estimators are derived, which outperform thecommonly applied Ephraim-Malah algorithm.Keywords and phrases: speech enhancement, MAP estimation, speech model.1.INTRODUCTIONThe reduction of acoustical background noise using a singlemicrophone is an important subject to improve the quality ofspeech communication systems in the context of digital hearing aids, speech recognition, hands-free telephony, or teleconferencing. Although single-microphone speech enhancement has been a research topic for decades, the estimationof a clean speech signal from its noisy observation remainsa challenging task, especially due to the wide variety of environmental noises.If the disturbing noise is assumed to be truly environmental, that is, its origin is, for example, machines, cars, orseveral persons talking at the same time, the specific properties of speech such as nonwhiteness, nonstationarity and nonGaussianity compared to unwanted noise allow a diﬀerentiation between speech and noise.Nonwhiteness means that the short-time spectrum ofspeech is generally less flat than that of acoustic noise. ThisThis is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.property can be exploited by separating speech and noise inthe spectral domain. The concept of spectral domain noiseattenuation has been introduced more than twenty years agoby Boll [1] as the subtraction of an estimated noise spectralmagnitude from the noisy spectral magnitude.To estimate the noise power spectral density, the second property, nonstationarity, is exploited by averaging DFTsquared magnitudes in noise-only phases or by trackingspectral minima over time [2]. Noise reduction by spectraldomain weighting has frequently been plagued by musicaltones, that is, annoying fluctuations in the residual noise signal. This is especially due to the subtraction of an expectation in terms of the noise power spectral density from an instantaneous value. To overcome this problem, improved algorithms have been proposed by Ephraim and Malah [3, 4].The clean speech spectral amplitude is estimated with respectto the minimization of a statistical error criterion. Togetherwith a recursive estimation of the underlying speech variance, the approach results in a good speech quality withoutaudible musical noise.Recently, the third property, non-Gaussianity, has beenincluded in the spectral domain noise reduction frameworkby Martin [5, 6]. The statistical estimation of the speech

Speech Estimation Using a Super-Gaussian Speech Model1111SpeechestimationG(k)Ŝ(k)IFFTY owingˆξ(k),γ̂(k)ŝ(l)Figure 1: Overview of the single-channel speech enhancement system (l: time index, k: frequency index).spectrum requires a statistical model of the undisturbedspeech and noise spectral coeﬃcients. It is well known thatspeech samples have a super-Gaussian distribution, whichcauses the speech spectral coeﬃcients to be super-Gaussiandistributed as well. By including a super-Gaussian model ofspeech, the mean squared error of a statistical estimator canbe decreased compared to an estimation with an underlyingGaussian model. Whereas the proposed estimators by Martinwith underlying Gamma or Laplace PDFs for real and imaginary parts of speech and noise DFT coeﬃcients [5, 6] are optimal with respect to the mean squared estimation error ofthe estimated complex speech DFT coeﬃcient, they are suboptimal for the estimation of the speech spectral amplitude.Spectral amplitude estimation can be considered moreadvantageous due to the perceptual unimportance of thephase [7]. Ephraim and Malah have proposed two estimatorsthat minimize the squared or logarithmic error of the speechspectral amplitude under a Gaussian model of the complexspeech and noise DFT coeﬃcients [3, 4].In this contribution spectral amplitude estimators withsuper-Gaussian speech modelling are introduced. The probability density function of the speech spectral amplitude isapproximated by a function with two parameters. With aproper choice of the parameters, for example, the probability density of the amplitude of a complex random variable (RV) with both independent Laplace and Gamma components can be approximated with high accuracy. Also, theparameters of the underlying PDF can be optimally fitted to the real distribution of the speech spectral amplitude for a specific noise reduction algorithm. Using thisstatistical model, computationally eﬃcient speech estimators can be found by applying the maximum a posteriori(MAP) estimation rule. The resulting estimators, which aresuper-Gaussian extensions of the MAP estimators derived byWolfe and Godsill [8], outperform the commonly appliedEphraim-Malah estimators by the more accurate statisticalmodel.The remainder of the paper is organized as follows.Section 2 gives an overview of the single-channel noise reduction by spectral weighting. Section 3 introduces the underlying statistical model for the speech and noise spectral amplitudes along with comparisons to experimentaldata. In Section 4 the statistical model is applied to derivea MAP estimator for the speech spectral amplitude and ajoint MAP estimator for the speech spectral amplitude andphase. Finally, in Section 5, experimental results are presented.2.OVERVIEWFigure 1 shows an overview of the single-channel speech enhancement system examined in this work [9]. The noisy timesignal y(l) sampled at regular time intervals l · T is composedof clean speech s(l) and additive noise n(l):y(l) s(l) n(l).(1)After segmentation and windowing with a function h(l), forexample, Hann window, the DFT coeﬃcient of frame λ andfrequency bin k is calculated withY (λ, k) L 1y(λQ l)h(l)e j2πlk/L ,(2)l 0L denotes the DFT frame size. For the noise reduction systemapplied in this work, L 256 is used at a sampling frequencyof 20 kHz. For the computation of the next DFT, the windowis shifted by Q samples. To decrease the disturbing eﬀects ofcyclic convolution, we apply half overlapping Hann windowswith 16 zeros at the beginning and end. The eﬀective framesize is thus only 224 samples, which corresponds to a framesize of 11.2 milliseconds and a frame shift of 5.6 milliseconds,respectively.The noisy DFT coeﬃcient Y consists of speech part S andnoise N:Y (λ, k) S(λ, k) N(λ, k),(3)with S SRe jSIm and N NRe jNIm , where SRe Re{S}and SIm Im{S}. In polar coordinates the noisy DFT coeﬃcient of amplitude R and phase ϑ is written asR(λ, k)e jϑ(λ,k) A(λ, k)e jα(λ,k) B(λ, k)e jβ(λ,k) .(4)The speech DFT amplitude is termed as A, the noise DFTamplitude as B, and the respective phases as α, β.

1112EURASIP Journal on Applied Signal ProcessingThe SNR estimation block calculates a priori SNR ξ anda posteriori SNR γ for each DFT bin k. The SNR calculation requires an estimate of the noise power spectral densityσN2 (λ, k). It can be estimated by averaging DFT squared magnitudes in periods of speech pauses. Assuming that noise isstationary, the measured PSD can be saved and applied as anestimate during following speech activity. This method requires a reliable voice activity detector (e.g., [10]). However,a VAD is diﬃcult to tune and its application at low SNRs often results in clipped speech. Therefore, we apply minimumstatistics, which tracks minima of the smoothed periodogramover a time period that greatly exceeds the speech short-timestationarity [2].Based on the noise estimates σ̂N2 and the observed Fourieramplitudes R the a priori and the a posteriori SNRs are estimated by3.STATISTICAL MODELWe introduce the statistical model for the speech and noisespectral amplitudes. For the sake of brevity the frame indexλ and frequency index k are omitted, however the followingconsiderations hold independently for every frequency bin kand frame λ.Motivated by the central limit theorem, real and imaginary parts of both speech and noise DFT coeﬃcients arevery often modelled as zero-mean independent Gaussian[3, 14, 15] with equal variance. This is due to the propertiesof the DFT:Y (λ, k) L 1l 0 jˆ k) ξ(λ,σ̂S2 (λ, k),σ̂N2 (λ, k)γ̂(λ, k) R2 (λ, k)σ̂N2 (λ, k)(5)Here, σS2 denotes the instantaneous power spectral density ofthe speech. Whereas the a posteriori SNRs γ can directly becomputed, the a priori SNRs ξ have to be estimated. This isperformed using a recursive approach proposed by Ephraimand Malah [3]:2ˆ k) αsnr Â (λ 1, k) 1 αsnr F γ λ̂, k 1 ,ξ(λ,σ̂N2 (λ, k) x,F[x] 0,x 0,else.(6)An alternative estimation approach which incorporates frequency correlation is presented in [11]. It is frequently argued [12, 13] that the recursive approach is essential for ahigh quality of the enhanced signal. A high smoothing factorαsnr greatly reduces the dynamics of the instantaneous SNRin speech pauses and thus reduces musical tones. Howeverthe a priori SNR will then comprise a delayed version of thespeech. Since the a priori SNR has a high impact on the noisereduction amount, it is useful to lower limit the a priori SNRaccording to ˆ k), ξ(λ, k) ξ(λ, ξthr ,ˆ k) ξthr ,ξ(λ,else.(7)The task of the speech estimation block is the calculation ofspectral weights G for the noisy spectral components Y , suchthat the estimated speech DFT coeﬃcient Ŝ is calculated by ˆ Ŝ(λ, k) G ξ(λ,k), γ̂(λ, k) · Y (λ, k).L 1l 0.(8)After IFFT and overlap-add, the enhanced time signal ŝ(l) isobtained.2πklLy(λQ l) cos(9)2πkly(λQ l) sin,Lwhere L samples are added after multiplication with modulation terms. The central limit theorem states that the distribution of the DFT coeﬃcients will converge towards a GaussianPDF regardless of the PDF of the time samples y(l), if successive samples are statistically independent. This also holdsif the correlation in y(l) is short compared to the analysisframe size [14].For many relevant acoustic noises this assumption holds.Moreover, multiple noise sources or reverberation often reduce the noise correlation in between the analysis frame size,so that the Gaussian assumption is fulfilled. The variance ofthe noise DFT coeﬃcient σN2 is assumed to split equally intoreal and imaginary parts. Thus, the probability density function of real and imaginary parts of noise Fourier coeﬃcientscan be modelled as 1p NRe expπσN 2NRe.σN2(10)Based on (10) and the assumption of statistically independent real and imaginary parts, the PDF of the noisy spectrumY conditioned on the speech amplitude A and phase α can bewritten as joint Gaussian:p(Y A, α) 1expπσN2 Y Ae jα 2σN2.(11)A Rice PDF is obtained for the density of the noisy amplitudegiven the speech amplitude A after polar integration of (11)[15]:2Rp(R A) 2 expσN R2 A22AR I0,σN2σN2(12)where I0 denotes the modified Bessel function of the firstkind and zeroth order.Considering speech, the span of correlation with typicalframe sizes from 10 milliseconds to 30 milliseconds cannotbe neglected. The smaller the frame size, the less Gaussian

Speech Estimation Using a Super-Gaussian Speech 10.10.25SImSIm0.100 1 1 2 2 3 3 2 10SRe123 3 3 2 1(a)0SRe123(b)Figure 2: Contour lines of complex Gaussian model with independent Cartesian coordinates and of complex Laplace model with independent Cartesian coordinates (σS2 1).will the distribution of the speech real and imaginary partsof the Fourier coeﬃcients will be. It is well known, that thePDFs of speech samples in the time domain are much bettermodelled by a Laplace or Gamma density [16]. In the frequency domain similar distributions can be observed. Martin [5, 6] has abandoned the Gaussian speech model according to p SRe 1 expπσSS2 Re.σS2(13)Instead, the Laplace probability density function p SRe 1expσS 2 SRe σS(14)and Gamma PDFs for statistical independent real and imaginary parts have been proposed: p SRe 1/2 4 3 SRe exp4 2 2 πσS 3 SRe .2σS(15)The same equations hold for the imaginary parts.3.1. Modelling the spectral amplitudesIn the following a simple statistical model for the speech andnoise spectral amplitudes will be presented [17], which is significantly closer to the real distribution than the commonlyapplied Gaussian model.The spectral amplitudes are of special importance, because the phase of the Fourier coeﬃcients can be consideredunimportant from a perceptual point of view [7, 18]. Hence,spectral amplitude estimators are more advantageous and astatistical model for the amplitude alone is needed.Considering noise, the Gaussian assumptions hold dueto comparably low correlation in the analysis frame. Assuming statistical independence of real and imaginary parts thePDF of the noise amplitude B can easily be found as Rayleighdistributed by polar integrationp(B) 2π0 B · p NRe , NIm dβ 2BexpσN2 B2. (16)σN2For the calculation of an appropriate PDF for A, the Gauss,Laplace, and Gamma PDFs for real and imaginary parts aretaken into account. The real and imaginary parts of theFourier coeﬃcients can be considered statistically independent with high accuracy. Then, p(A) can in general be calculated byp(A) 2π0A · p(A cos α) · p(A sin α)dα,(17)with the PDFs according to (13), (14), or (15) for p(SRe A cos α), p(SIm A sin α).Figure 2 shows contour lines of a complex Gaussian orLaplace PDF with independent Cartesian components. Compared to the Gaussian PDF, the Laplace PDF has a higherpeak, a low amplitude and decreases slower towards higheramplitudes visible by the greater distances of the contourlines compared to the complex Gaussian PDF. While thecomplex Gaussian PDF is rotational invariant, the Laplaceamplitude depends on the phase.Considering Gaussian components, the rotational invariance greatly facilitates the polar integration. Similar to (16)the amplitude is Rayleigh distributed:p(A) 2AexpσS2 A2.σS2(18)

1114EURASIP Journal on Applied Signal Processingrameter µ is introduced, which enables to approximate both.After normalizing A by the standard deviation σS we thus assume1p(A)p(A) exp0.500123AHistogram amplitude of complex Laplace random valuesHistogram amplitude of complex Gamma random valuesRayleigh PDFFigure 3: Measured histograms of amplitudes of complex 1.000.000random variables with independent Cartesian Laplace (solid) orGamma (dashed) components along with Rayleigh PDF (dotted)(σS2 1).The PDF of the amplitude of a complex Laplace or Gammarandom variable with independent Cartesian componentsvaries with the angle α. This makes an analytic calculationof the distribution A S2Re S2Im for (14) or (15) diﬃcult,if not impossible.Instead of an analytic solution to (17) we are lookingfor a function that approximates the real PDF of the spectral amplitudes with high accuracy regardless of the underlying joint distribution of real and imaginary parts of theFourier coeﬃcients. However, as indication about how thefunction should look like the amplitude of a complex Laplaceor Gamma PDF with independent components is taken intoaccount. Figure 3 plots histograms of the amplitude A S2Re S2Im of 1.000.000 Laplace and Gamma, respectively,distributed independent random values SRe , SIm of varianceσS2 /2. Whereas the Laplace-distributed random variables caneasily be generated using the inverse distribution functionmethod [19], the Gamma-distributed random values weregenerated according to [20]. Compared to the Rayleighdistributed amplitude of a complex Gaussian random variable, low values are more likely, but the PDF decreases moreslowly towards high values.The fast decay of the Rayleigh PDF results from thesecond-order term of A in the argument of the exponentialfunction in (18) similar to the decay of the Gauss function in(13). Similarly, the measured PDFs of the complex Laplaceand Gamma amplitudes can be assumed to decay like (14)and (15) with a linear argument in the exponential function.Apparently, the slope of the Gamma amplitude PDF differs from that of the Laplace amplitude PDF. Hence, a pa- µA.σS(19)At low values of A the PDF of the Laplace and Gamma amplitudes is much higher than the Rayleigh PDF as shown inFigure 3. Considering the Rayleigh PDF according to (18),the behavior at low values is mainly due to the linear term ofA, whereas the exponential term plays a minor role at smallvalues.Both the PDF of the Laplace amplitude and the PDF ofthe Gamma amplitude can be approximated by abandoninga linear term in A. Instead, A is taken to the power of a parameter ν after normalization to the standard deviation ofspeech, that is, p(A) (A/σS )ν in order to be able to approximate a large variety of PDFs. The smaller the parameter ν,the larger the proposed PDF at low values. The term hardlyinfluences the behavior of the function at a high value due tothe dominance of the exponential decayp(A) AνexpσSν µA.σS(20) After taking 0 p(A)dA 1 into account, the approximatingfunction with parameters ν, µ is finally obtained using [21,equation 3.381.4]:p(A) µν 1 AνexpΓ(ν 1) σSν 1 µA.σS(21)Here, Γ denotes the Gamma function.Figure 4 shows the approximation of the measured histogram of the amplitude of 1.000.000 complex Laplace orGamma random values with independent components withσS2 1 by (21) using diﬀerent sets of parameters ν, µ.Apparently, (21) allows a very accurate approximation forboth Laplace and Gamma components. To approximate theLaplace amplitude, we applied the parameter set (ν 1,µ 2.5). To approximate the Gamma amplitude we used(ν 0.01, µ 1.5). PDFs in between both or closer to theRayleigh PDF can be approximated with diﬀerent sets of parameters ν, µ.3.1.1. Matching with experimental dataThe real PDF of the speech amplitude will not be exactlylike the Laplace or Gamma amplitude approximation butsomewhere in between. Also, it will depend on parametersof the noise reduction system such as the analysis frame size.At a larger frame size the correlation decreases relative tothe analysis frame size and thus the distribution will be lesssuper-Gaussian. The task is therefore to find a set of parameters (ν, µ) which outperforms the above sets for Laplace orGamma amplitude approximation for a given system.

Speech Estimation Using a Super-Gaussian Speech Histogram of amplitude of complex Laplace random stogram of amplitude of complex Gamma random values(b)Figure 4: Approximation of amplitudes of complex random values with Laplace and Gamma components using (21). (a) Laplacecomponents: (ν 1, µ 2.5). (b) Gamma components: (ν 0.01,µ 1.5).To measure the probability density function of the speechcomplex DFT coeﬃcients S or speech DFT amplitudes A, ahistogram is built using 1-hour speech from diﬀerent speakers. Ideally, DFT bins, which solely contain speech of equalvariance, should be taken into account.In practice, the speech variance in a frequency bin isstrongly time variant and can only be estimated in a timeframe and frequency bin with a certain estimation error.Thus, we apply (6), which is commonly considered as thebest performing method to estimate the speech variance inthe form of the a priori SNR. Hereby, the histogram measurement process also incorporates the same method of estimating the time-varying speech variance as the noise reductionsystem. Data is collected for the histogram at time instances,when the frequency bin is dominated by speech. For that purpose a high and narrow a priori SNR interval is predefined,for example, 19–21 dB. The width of the interval is a tradeoﬀ between the amount of data obtained and the demand topick samples of same variance.Figure 5a shows the contour lines of the measured speechDFT coeﬃcients. The data shown has been obtained bybuilding separate histograms for each frequency and normalizing each histogram to σS2 1 for an averaged histogram over the frequency. Compared to the Gaussian contour lines in Figure 2, a slower decrease towards high am-1115plitudes and faster increase towards low amplitudes is visible. Also, the observed data hardly shows any dependencyon the phase as in the Laplace contour lines in Figure 2 asshown for the complex Laplace PDF in Figures 5b, 5c, 5d,5e, 5e, 5f, and 5g which depict the histogram of phases forthe six specific contour lines. Approximately, the phases canbe considered as uniformly distributed. The variation visiblefor A 0.005 is probably due to the low amount of dataavailable here.Figure 6a a plots the histogram of the speech amplitude, which is obtained by integration over the phase of thetwo-dimensional histogram along with the analytic RayleighPDF and the approximation according to (21) with the parameter set for Laplace and Gamma amplitude approximations, respectively. Figure 6b shows a zoom into the higherregions. Apparently, (21) provides a much better fit for thespeech amplitude than the Rayleigh PDF for both Laplaceand Gamma amplitude approximations. For low arguments,the Rayleigh PDF rises too slowly, while for large arguments,the density function decays too fast. The real PDF of thespeech amplitude lies between the Laplace and Gamma amplitude approximations for the data measured with our system the Gamma amplitude approximation.To find a set (ν, µ) that approximates the real PDF best, adistance measure between the analytic function and the histogram with N bins is numerically minimized. The Kullbackdivergence [22] can be considered optimal from an information theoretical point of view. Given two random variables ofprobability density p1 (x) and p2 (x), then I(2 : 1) describesthe mean information per observation of process 2 for discrimination in favor of process 2 and I(1 : 2) for discrimination in favor of process 1: p1 (x)dx,p2 (x) p2 (x)I(2 : 1) p2 (x) logdx.p1 (x)I(1 : 2) p1 (x) log(22)The sum J(1 : 2) I(1 : 2) I(2 : 1) is a measure of divergence between the two processes. To diﬀerentiate between theanalytical pA (n) and the histogram PDF ph (n) with N bins,the divergence can be calculated byJ(A : h) N n 1 ph (n) pA (n) log ph (n).pA (n)(23)Figure 7 shows the best p(A) according to (21) determinedby minimizing the Kullback divergence. The analytical PDFnow fits even better to the observed data than the Laplace orGamma amplitude approximation. To illustrate the improvement provided by the new model, Table 1 shows the Kullbackdivergences between measured data and model functions.The divergences have been normalized to that of the RayleighPDF, that is, the Gaussian model. When using the Laplace orGamma amplitude approximation, the Kullback divergenceis significantly lower than that for the Gaussian model. Bydetermining an optimal parameter set, the divergence further decreases.

1116EURASIP Journal on Applied Signal Processing30.00520.010.025SIm10.050.10.250A 0.005 1p(α)0.2 2 3 3 2 10SRe120.103 20(a)(b)A 0.01A 0.0250.2p(α)p(α)0.20.10 200.102 20α(d)A 0.05A 0.10.2p(α)0.2p(α)2α(c)0.102α 200.102 20α2α(e)(f)A 0.25p(α)0.20.10 20α2(g)Figure 5: (a) Contour lines of measured speech DFT coeﬃcients. ((b), (c), (d), (e), (f), (g)) Histogram of speech DFT phases for six diﬀerentamplitudes.

11171.60.161.40.141.20.1210.1p(A)p(A)Speech Estimation Using a Super-Gaussian Speech 322.53AAGamma ampl. approx. (ν 0.01, µ 1.5)Laplace ampl. approx. (ν 1, µ 2.5)Rayleigh PDFHistogram of speech spectral amplitudesGamma ampl. approx. (ν 0.01, µ 1.5)Laplace ampl. approx. (ν 1, µ 2.5)Rayleigh PDFHistogram of speech spectral igure 6: (a) Histogram of speech DFT amplitudes A (σS2 1) fitted with Rayleigh PDF and Laplace/Gamma amplitude approximation(21). (b) Zoom into the area 1.5 A 2.53AKullback divergence fit (ν 0.126, µ 1.74)Histogram of speech spectral amplitudesKullback divergence fit (ν 0.126, µ 1.74)Histogram of speech spectral amplitudes(a)(b)Figure 7: (a) Histogram of speech DFT amplitudes and fitted approximation by (21) according to Kullback divergence (σS2 1). (b) Zoominto the area 1.5 A 3.3.1.2. Reverberant signalThe acoustic environment will influence the distribution ofthe speech spectral amplitude. Especially if the desired acoustic source is located at larger distances from the microphone,for example, in a hearing aid application, reverberation willdegrade the amount of correlation in between an analysisframe and thus will lead to a less super-Gaussian distribution.To examine the amount of influence of reverberation, thescenario depicted in Figure 8 is considered. The acousticalimpulse response in a reverberant room from a source toa microphone was simulated with the image method [23],which models the reflecting walls by several image sources.

1118EURASIP Journal on Applied Signal ProcessingTable 1: Normalized Kullback divergence between measured speech PDF and diﬀerent model functions.p(A)ν, µJ(A : h)/J(A : h)RayleighRayleigh (18)Laplace amplitude approximation (21)Gamma amplitude approximation (21)Kullback fit (21)—1, 2.50.01, 1.50.126, 1.7410.350.050.0451.42mRoom dimensions:Lx L y 7 mLz 3 m1.22mMicrophone1Reverb. time:T0 0.2 sp(A)Reflection coeﬀ.:ζ 0.72Ly0.80.6Position source:(5 m, 2 m, 1.5 m)0.4Position microphone:(5 m, 5 m, 1.5 m)0.2Speech source2m000.511.52m22.53AKullback divergence fit (ν 0.264, µ 1.82)Histogram of speech spectral amplitudesLxFigure 8: Simulation of impulse response between speech sourceand microphone in a reverberant room using the image method.(a)0.16 ζ exp111 13.82/ c T0Lx L y Lz 0.140.120.1p(A)The intensity of the sound from an image source at the microphone array is determined by a frequency-independentreflection coeﬃcient ζ and by the distance to the microphone. In our experiment, the reverberation time was setto T0 0.2 seconds, which corresponds to a reflectioncoeﬃcient of ζ 0.72 according to Eyring’s formula0.080.06.(24)0.040.02The histogram of the speech amplitude was then taken as before after convolving the database of speech with the impulseresponse delivered by the image method.Figure 9 plots the histogram along with the approximation with parameters fitted according to the Kullbackdivergence. As expected, the speech spectral amplitude isnow less super-Gaussian distributed. However the optimal parameters with respect to the Kullback divergence(i.e., ν 0.264, µ 1.82) are still much closer to the values originally obtained from the Kullback fit than to thoseof the Laplace amplitude approximation or even from theRayleigh PDF. It can be concluded that accuracy of the statistical model is only slightly aﬀected by reverberation. Whereasa slight performance gain can be expected when adapting theparameters of the statistical model during run-time, the gain01.522.53AKullback divergence fit (ν 0.264, µ 1.82)Histogram of speech spectral amplitudes(b)Figure 9: (a) Histogram of speech amplitudes in reverberant roomand fitted approximation (21) according to Kullback divergence(σS2 1). (b) Zoom into the area 1.5 A 3.might not justify the additional computational complexity ofan acoustic classifier. Thus, in the following the fixed parameter set (ν 0.126, µ 1.74) is considered as optimal.

Speech Estimation Using a Super-Gaussian Speech 2.53BRayleigh PDFLaplace amp. aprox.HistogramRayleigh PDFLaplace amp. gh PDFLaplace amp. aprox.Histogram(c)Figure 10: Histogram of noise DFT amplitudes B for (a) white uniform distributed noise, (b) fan noise, and (c) cafeteria noise (σN2 1)fitted with Rayleigh PDF and Laplace amplitude approximation.3.1.3. Spectral amplitude of noiseCompared to speech, the span of noise correlation in an analysis frame is much lower. Thus, the PDF of the real andimaginary parts of the noise spectral coeﬃcients will according to the central limit theorem be closer to a Gaussian function. Martin [5, 6] has proposed spectral estimators with Laplace or Gaussian noise model (and Laplace andGamma models for the speech coeﬃcients). A Laplace modelfor noise is motivated by the observation that environmentalnoises are also super-Gaussian distributed to a certain degree.Figure 10 plots histograms of DFT amplitudes measured forthree diﬀerent noise classes. For building the histograms, thefrequency- and time-dependent noise variances σN2 were estimated using the same system as applied in the noise reduction algorithm,

Figure 1: Overview of the single-channel speech enhancement system (l: time index, k: frequency index). spectrum requires a statistical model of the undisturbed speech and noise spectral coeﬃcients. It is well known that speech samples have a super-Gaussian distribution, which causes the speech spectral coeﬃcients to be super-Gaussian

Related Documents:

Speech enhancement based on Bayesian decision and spectral amplitude ...

2 The proposed BDSAE speech enhancement method In this section, we first present conventional spectral ampli-tude estimation scheme for speech enhancement. Then, the proposed speech enhancement scheme based on Bayesian decision and spectral amplitude estimation is described. Finally, we derive the optimal decision rule and spectral

14 Views

1y ago

Single-channel speech enhancement using spectral subtraction in the ...

modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs. Key words: Speech .

12 Views

1y ago

Speech Enhancement using Boll's Spectral Subtraction Method based on ...

speech enhancement such as spectral subtraction methods, MMSE methods, Weiner algorithm etc. [2]. This paper attempts the Boll's Spectral Subtraction method of Speech Enhancement [3]. In this Method, the noisy speech signal is partitioned into frames. Each frame is multiplied by a window function prior to the

8 Views

1y ago

Deep Neural Network based Speech Enhancement - ViVoLab

Speech enhancement based on deep neural network s SE-DNN: background DNN baseline and enhancement Noise-universal SE-DNN Zaragoza, 27/05/14 3 Speech Enhancement Enhancing Speech enhancement aims at improving the intelligibility and/or overall perceptual quality of degraded speech signals using audio signal processing techniques

35 Views

1y ago

Speech Enhancement Techniques: Quality vs. Intelligibility - IJFCC

coefficient) perturbation. Various speech enhancement techniques have been considered here such as spectral subtraction, spectral over subtraction with use of a spectral floor, spectral subtraction with residual noise removal and time and frequency domain adaptive MMSE filtering. The speech signal sued here for recognition experimentation was

24 Views

1y ago

A ol 5 N 1 Spectral Restoration Based Speech Enhancement for ... - Unirioja

including spectral subtraction [2-5] Wiener filtering [6-8] and signal subspace techniques [9-10], (ii) Spectral restoration algorithms including . Spectral restoration based speech enhancement algorithms are used to enhance quality of noise masked speech for robust speaker identification. In presence of background noise, the performance of .

9 Views

1y ago

Multi-Band Spectral Subtraction for Speech Enhancement Using Sine ...

Multiband spectral subtraction was proposed by Kamath [4]. It is very hard for any speech enhancement algorithms to perform homogeneously over all noise types. For this reason algorithms are built on certain assumptions. Spectral subtraction algorithm of speech enhancement is built under the assumption that the noise is additive and is

9 Views

1y ago

National Aeronautics and Space Administration

The topic for this collection is Black Holes, which is a very popular, and mysterious subject among students hearing about astronomy. Students have endless questions about these exciting and exotic objects as many of you may realize! Amazingly enough, many aspects of black holes can be understood by using simple algebra and pre-algebra mathematical skills. This booklet fills the gap by .

53 Views

3y ago

Recent Views

MERRILL ALABAMA CAPITOL SECRETARY OF STATE

Aug 24, 2018 · State House 38 Brian McGee state House 40 Pamela Jean Howard State House 41 Emily Anne Marcum State House 43 Carin Mayo State House 45 Jenn Gray state House 46 Felicia Stewart State House 4 7 1Jim Toomey State House 48 IAlli Summerford State House 51 Veronica R. Johnson State House 52 John W. Rogers, Jr. State House 53 Anthony Daniels

2y ago

375 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

293 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

54 Views

Policy - Kiwibank

House Insurance is provided by The Hollard Insurance Company Pty Ltd. The Hollard Insurance Company Pty Ltd is the only organisation responsible for claims under this cover. Administration of House Insurance and claims handling services are managed by Ando Insurance Group Limited on behalf of The Hollard Insurance Company Pty Ltd.

1y ago

133 Views

House insurance - Tower

insurance in New Zealand. We've included limits and exclusions to your house cover throughout this policy wording and on your certificate of insurance. What your house policy does and does not cover What we cover We cover your house, meaning the domestic buildings you own at the situation shown on your certificate of insurance including its: 1.

1y ago

145 Views

Speech Enhancement By MAP Spectral Amplitude Estimation Using A Super .

It looks like you're using an ad-blocker