A New Speech Enhancement Technique Using Perceptual Wiener Filter

1y ago
26 Views
2 Downloads
866.24 KB
5 Pages
Last View : 5d ago
Last Download : 3m ago
Upload by : Warren Adams
Transcription

View metadata, citation and similar papers at core.ac.ukbrought to you byCOREprovided by International Journal of Innovative Technology and Research (IJITR)M Venkatrao * et al.(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCHVolume No.5, Issue No.6, October - November 2017, 7626-7630.A New Speech Enhancement TechniqueUsing Perceptual Wiener FilterM.VENKATRAOAsst.Prof, Department of ECE, Amrita Sai Instituteof Science and Technology.K. TIRUMALA RAOAsst.Prof, Department of ECE, Amrita Sai Instituteof Science and Technology.G. SIVA KUMARAsst.Prof, Department of ECE, Amrita Sai Institute of Science and Technology.Abstract- This paper deals with musical noise result from perceptual speech enhancement type algorithmsand especially wiener filtering. Although perceptual speech enhancement methods perform better thanthe non perceptual methods, most of them still return annoying residual musical noise. This is due to thefact that if only noise above the noise masking threshold is filtered then noise below the noise maskingthreshold can become audible if its maskers are filtered. It can affect the performance of perceptualspeech enhancement method that process audible noise only. In order to overcome this drawback hereproposed a new speech enhancement technique. It aims to improve the quality of the enhanced speechsignal provided by perceptual wiener filtering by controlling the latter via a second filter regarded as apsychoacoustically motivated weighting factor. The simulation results shows that the performance isimproved compared to other perceptual speech enhancement methodsI.INTRODUCTIONThe objective of speech enhancement process is toimprove the quality and intelligibility of speech innoisy environments. The problem has been widelydiscussed over the years. Many approaches havebeen proposed like subtractive type [1-4],Perceptual Wiener filtering algorithms. Amongthem spectral subtraction and the Wiener filteringalgorithms are widely used because of their . In these algorithms, Such methodsreturn residual noise known as musical noise. Thistype of noise is quite annoying. In order to reducethe effect of musical noise, several solutions havebeen proposed. Some involve adjusting parametersof spectral subtraction so as to offer moreflexibility as in [2] and [3]. Other such as proposedin [4], are based on signal subspace approaches.Despite the effectiveness of these techniques toimprove the signal to noise ratio (SNR), theproblem of eliminating or reducing musical noise isstill a challenge to many researchers. In the last fewdecades the introduction of psychoacoustic modelshas attracted a great deal of interest. The objectiveis to improve the perceptual quality of theenhanced signal. In [3], a psychoacoustic model isused to control the parameters of the spectralsubtraction in order to find the best trade ofbetween noise reduction and speech distortion. Tomake musical noise inaudible, the linear estimatorproposed in [5] incorporates the masking propertiesof the human auditory system. In [6], the maskingthreshold and intermediate signal, which is slightlydenoised and free of musical noise, are used todetect musical tones generated by the spectralsubtraction methods. This detection can be used bya post-processing aimed at reducing the detected2320 –5547tones. These perceptual speech enhancementsystems reduce the musical noise but introducesome undesired distortion to the enhanced speechsignal. When this distorted estimated speech signalis applied to the recognition systems theirperformance degrades drastically.The basic idea of the proposed method is toremove, perceptually significant noise componentsfrom the noisy signal, so that the clean speechcomponents are not affected by processing. Inaddition, the technique requires very little a prioriinformation of the features of the noise. In thepresent paper, we propose to control the perceptualwiener filtering by psychoacoustically motivatedfilter that can be regarded as weighting factor. Thepurpose is to minimize the perception of musicalnoise without degrading the clarity of the enhancedspeech.II.STANDARD SPEECH ENHANCEMENTTECHNIQUELet the noisy signal can be expressed asy ( n ) s ( n ) d ( n) ,(1)x(n) is the original clean speech signaland d (n) is the additive random noise signal,Whereuncorrelated with the original signal. Taking DFTto the observed signal givesY (m, k ) S (m, k ) D(m, k ) .(2)m 1,2,., M is the frame index,k 1,2,., K is the frequency bin index, M isthe total number of frames and K is the framelength, Y (m, k ), S (m, k ) and D(m, k ) representWhere@ 2013-2017 http://www.ijitr.com All rights Reserved.Page 7626

M Venkatrao * et al.(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCHVolume No.5, Issue No.6, October - November 2017, 7626-7630.the short time spectral components of they (n), S (n)and (n) , respectively. Clean speechspectrum Sˆ (m, k ) is obtained by multiplying noisyspeech spectrum with filter gain function as givenin eqation (3)Sˆ (m, k ) H (m, k )Y (m, k )the signal masking threshold [9, 13].Figure1 depicts the complete block diagram of theproposed speech enhancement method.(3)Where H (m, k ) is the noise suppression filtergain function (conventional Wiener filter (WF)),which is derived according to MMSE estimator andH (m, k ) is given byH (m, k ) (m, k )1 (m, k )(4)Where (m, k ) is an apriori SNR, which isdefined as (m, k ) s (m, k ). d (m, k )Figure1. Block diagram of the proposed speechenhancement method(5) and (m, k ) E S (m, k ) represents d (m, k ) E D(m, k )3.1 Gain of Perceptual Wiener filter (PWF)22thesestimated noise power spectrum and clean speechpower spectrum, respectively. A posterioriestimation is given by ( m, k ) Y ( m, k )2(6) d (m, k )H (m 1, k )Y (m 1, k ) ˆ(m, k ) (1 ) P' V (m, k . (7) d2x 0 and P x 0 otherwise.P x xifThe noise suppression gain function is chosen asthe Wiener filter similar to [13]III. PERCEPTUAL SPEECHENHANCEMENTAlthough the Wiener filtering reduces the level ofmusical noise, it does not eliminate it [15]. Musicalnoise exists and perceptually annoying. In an effortto make the residual noise perceptually inaudible,many perceptual speech enhancement methodshave been proposed which incorporates theauditory masking properties [2-9]. In these methodsresidual noise is shaped according to an estimate of2320 –55472J Sˆ (m, k ) S (m, k ) (8)Substituting (2) and (3) in (9) results toAn estimate of ˆ(m, k ) of (m, k ) is given by thewell known decision directed approach [9] and isexpressed asWhere V (m, k ) (m, k ) 1,The perceptual Wiener filter (PWF) gain functionH 1 (m, k ) is calculated based cost function, Jwhich is defined as E ( H 1 (m, k ) 1) S (m, k ) H 1 (m, k ) D(m, k ) d i ri(9)Where d i ( H 1 (m, k ) 1) 2 E S (m, k ) ri H 12 (m, k ) E D(m, k )22 and represents speechdistortion energy and residual noise energy.To make this residual noise inaudible, the residualnoise should be less than the auditory maskingthreshold, T (m, k ) . This constraint is given byri T (m, k )(10)By including the above constraint and substituting and (m, k ) E S (m, k ) in d (m, k ) E D(m, k )22s(9) the costfunction will become asJ ( H1 (m, k ) 1) 2 s (m, k ) H12 (m, k ) max d (m, k ) T (m, k ) ,0 (11)@ 2013-2017 http://www.ijitr.com All rights Reserved.Page 76272

M Venkatrao * et al.(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCHVolume No.5, Issue No.6, October - November 2017, 7626-7630.The desired perceptual modification of Wiener isobtained by differentiating J w.r.t H 1 ( m, k ) andequating to zero. The obtained perceptually definedWiener filter gain function is given byH 1 (m, k ) s (m, k )(12) s (m, k ) max( d (m, k ) T (m, k ),0)By multiplying and dividing equation (12) with d (m, k ) , H 1 (m, k ) will become asH 1 (m, k ) ˆ(m, k )max( d (m, k ) T (m, k ),0) ˆ(m, k ) d (m, k )(13)T (m, k ) is noise masking threshold which isestimated based on[16] noisy speech spectrum. Apriori SNR and noise power spectrum wereestimated using the two -step a priori SNRestimator proposed in [15] and weighted noiseestimation method proposed in[17],respectively.3.2 WEIGHTED PWFAlthough perceptual speech enhancement methodsperform better than the non-perceptual methods,most of them still return annoying residual musicalnoise. Enhanced speech signal obtained usingabove mentioned perceptual Wiener filter stillcontains some residual noise due to the fact thatonly noise above the noise masking threshold isfiltered and noise below the noise maskingthreshold is remain. It can affect the performanceof perceptual speech enhancement method thatprocesses audible noise only.enhancement algorithms, database [18]. The noisydatabase contains 30 IEEE sentences (produced bythree male and three female speakers) corrupted byeight different real world noises at different SNRs.Speech signals were degraded with different typesof noise at global SNR levels of 0 dB, 5 dB, 10 dBand 15 dB. In this evaluation only five noises areconsidered those are babble, car, train, airport andstreet noise. The objective quality measures usedfor the evaluation of the proposed speechenhancement method are the segmental SNR andPESQ measures [19]. It is well known that thesegmental SNR is more accurate in indicating thespeech distortion than the overall SNR. The highervalue of the segmental SNR indicates the weakerspeech distortion. The higher PESQ score indicatesbetter perceived quality of the proposed signal [19].The performance of the proposed method iscompared with Wiener filter and perceptual Wienerfilter.The simulation results are summarized in Table 1and Table 2. The proposed method leads to betterdenoising quality for temporal and the betterimprovements are obtained for the high noise level.The time-frequency distribution of speech signalsprovides more accurate information about theresidual noise and speech distortion than thecorresponding time domain wave forms. wecompared the spectrograms for each of the methodand confirmed a reduction of the residual noise andspeech distortion. Figure2. Represents thespectrograms of the clean speech signal, noisysignal and enhanced speech signals.Table.1 Segmental SNR values of EnhancedSignalsIn order to overcome this drawback we propose toweight the perceptual Wiener filters using apsychoacoustically motivated weighting filter.Psychoacoustically motivated weighting filter isgiven byNoiseType H (m, k ), ifATH (m, k ) d T (m, k )W (m, k ) (15) 1, otherwiseBabbleWhere ATH (m, k ) is the absolute threshold ofhearing. This weighting factor is used to weight theperceptual wiener filter. The gain function of theH 2 (m, k ) of the proposed weighted perceptualWiener filter is given byH 2 H 1 (m, k )W (m, k )(16)IV. SIMULATION RESULTSTo evaluate and compare the performance of theproposed scheme of speech enhancement,simulations are carried out with the NOIZEUS, Anoisy speech corpus for evaluation of speech2320 2.20150.752.623.5@ 2013-2017 http://www.ijitr.com All rights Reserved.Page 7628

M Venkatrao * et al.(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCHVolume No.5, Issue No.6, October - November 2017, 130.610.73100.691.202.70150.772.253.42car noise objective measure results showed theimprovement brought by the proposed method incomparison to some recent filtering techniques ofthe same type.Table.2 PESQ values of the enhanced .683V.CONCLUSIONIn this paper, an effective approach for suppressingmusical noise presented after wiener filtering hasbeen introduced. Based on the perceptual propertiesof the human auditory system, a weighting factoraccentuates the denoising process when noise isperceptually insignificant and prevents that residualnoise components might become audible in theabsence of adjacent maskers. When the speechsignal is additively corrupted by babble noise and2320 –5547Figure2. speech spectrogram,(a)original cleansignal,(b) noisy signal(babble noiseSNR 5dB),(c)enhanced signal using Wienerfilter(d)enhanced signal using PWF,(e)enhancedsignal using Weighted PWFVI. REFERENCES[1]Y. Ephraim and D. Malah, “Speechenhancement using a minimum mean � IEEE Trans. Acoust., Speech,Signal Processing,vol. ASSP-32, pp. 1109–1121, Dec 1984.[2]R. Schwartz M. Berouti and J. Makhoul,“Enhancement of speech corrupted byacoustic noise,” Proc. of ICASSP, 1979,vol. I, pp. 208–211.@ 2013-2017 http://www.ijitr.com All rights Reserved.Page 7629

M Venkatrao * et al.(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCHVolume No.5, Issue No.6, October - November 2017, peechenhancement based on masking propertiesof the human auditory system,” IEEETrans. Speech and Audio Processing, vol. 7,pp. 126–137, 1999.[14]Amehraye, D. Pastor, and A. Tamtaoui,“Perceptual improvement of Wienerfiltering.” Proc. of ICASSP, pp. aughnessy and Sid-Ahmed Selouani,“Speech enhancement based on novel twostep a priori SNestimators,” in , pp. 565-568, porating the human hearing propertiesin the signal subspace approach for speechenhancement,” IEEE Trans. Speech andAudio Processing,vol. 11, pp. 700–708,2003.J. D. Johnston, “Transform coding of audiosignals using perceptual noise criteria,”IEEE on Selected Areas in Comm., vol. 6,pp. 314–323, February1988.[17]Y.M. Cheng and D. O’Shaughnessy,“Speech enhancement based conceptuallyon auditoryevidence,” IEEE Trans.Signal Processing, vol.39, no.9, pp.1943–1954, 1991.M. Kato, A. Sugiyama and M. Serizawa,“Noise suppression with high speech qualitybased on weighted noise estimation andMMSESTSA,” IEICE Trans. Fundamentals,vol. E85-A, no.7, pp. 1710-1718, July 2002.[18]http://www.utdallas.edu/ loizou/speech/noizeus/[19]Yi Hu and Philipos C. Loizou, “Evaluationof Objective Quality Measures for SpeechEnhancement,” IEEE Trans. on Audio,Speech and Language Processing, vol. 16,no. 1, pp. 229238, January 2008.Y. Ephraim and H.L. Van Trees, “A signalsubspaceapproachforspeechenhancement,” IEEE Trans. Speech andAudio Processing, vol. 3, pp. 251–266,1995.Y. Hu and P. Loizou, “Incorporating apsychoacoustic model in frequency domainspeechenhancement,”IEEESignalProcessing Letters, vol. 11(2), pp. 270–273,2004.[8]D. Tsoukalas, M. Paraskevas, and J.Mourjopoulos, “Speech enhancement usingpsychoacoustic criteria,” IEEE ICASSP,pp.359–362, Minneapolis, MN, 1993.[9]Y. Hu and P.C. Loizou, "A " IEEE Trans. Speech AudioProcessing, pp. 457-465. Sept. 2003.[10]INTERSPEECH’08, Brisbane, Australia, pp.407-410, September 2008.L. Lin, W. H. Holmes and E. Ambikairajah,“Speech denoising using perceptualmodification of Wiener filtering,” IEEElectronic Letters, vol. 38, pp. 1486–1487,Nov 2002.[11]P. Scalart C. Beaugeant, V. Turbin and A.Gilloire, “New optimal filtering approachesfor hands-free telecommunication terminals,”Signal Processing, vol. 64 (15), pp. 33–47,Jan 1998.[12]T. Lee and Kaisheng Yao, “Speechenhancement by perceptual filter withsequential noise parameter estimation,”Proc. of ICASSP, vol. I, pp. 693–696, 2004.[13]Md. Jahangir Alam, Sid-Ahmed Selouani,Douglas O’Shaughnessy and S. Ben Jebara,“Speech enhancement using a Wienerdenoising technique and musical noisereduction”intheProceedingof2320 –5547AUTHOR’s PROFILEM.VENKATRAO,Asst.Prof,Department of ECE, Amrita Saiinstitute of science and technologyK.TIRUMALARAO,Asst.Prof,Department of ECE, Amrita Saiinstitute of science and technologyG.SIVAKUMAR,Asst.Prof,Department of ECE, Amrita Saiinstitute of science and technology@ 2013-2017 http://www.ijitr.com All rights Reserved.Page 7630

The objective of speech enhancement process is to improve the quality and intelligibility of speech in noisy environments. The problem has been widely discussed over the years. Many approaches have been proposed like subtractive type [1-4], Perceptual Wiener filtering algorithms. Among them spectral subtraction and the Wiener filtering

Related Documents:

Speech enhancement based on deep neural network s SE-DNN: background DNN baseline and enhancement Noise-universal SE-DNN Zaragoza, 27/05/14 3 Speech Enhancement Enhancing Speech enhancement aims at improving the intelligibility and/or overall perceptual quality of degraded speech signals using audio signal processing techniques

Speech Enhancement Speech Recognition Speech UI Dialog 10s of 1000 hr speech 10s of 1,000 hr noise 10s of 1000 RIR NEVER TRAIN ON THE SAME DATA TWICE Massive . Spectral Subtraction: Waveforms. Deep Neural Networks for Speech Enhancement Direct Indirect Conventional Emulation Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech

component for speech enhancement . But, recently, the [15] phase value also considered for efficient noise suppression in speech enhancement [5], [16]. The spectral subtraction method is the most common, popular and traditional method of additive noise cancellation used in speech enhancement. In this method, the noise

channel speech enhancement in the time domain. Traditional monaural speech enhancement approaches in-clude spectral subtraction, Wiener filtering and statistical model-based methods [1]. Speech enhancement has been extensively studied in recent years as a supervised learning This research was supported in part by two NIDCD grants (R01DC012048

2 The proposed BDSAE speech enhancement method In this section, we first present conventional spectral ampli-tude estimation scheme for speech enhancement. Then, the proposed speech enhancement scheme based on Bayesian decision and spectral amplitude estimation is described. Finally, we derive the optimal decision rule and spectral

modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs. Key words: Speech .

Keywords: Speech Enhancement, Spectral Subtraction, Kalman filter, Musical noise 1. INTRODUCTION Speech enhancement is used to improve intelligibility and overall perceptual quality of degraded speech using various algorithms and audio signal processing techniques. The aim of speech

speech enhancement based on the short-time spectral magnitude (STSM). In real processing speech enhancement techniques, the algorithm employed a simple principle in which the spectrum of the clean speech estimation signal can be obtained by subtracting a noise estimation spectrum from the noisy speech spectrum conditions.