An Advanced Speech Enhancement Approach With Improved Pitch Synchronous .

1y ago
10 Views
2 Downloads
1.05 MB
9 Pages
Last View : 25d ago
Last Download : 3m ago
Upload by : Aydin Oneil
Transcription

International Conference on Security and Authentication - SAPIENCE14149An Advanced Speech Enhancement Approachwith Improved Pitch Synchronous AnalysisV.R. Balaji1 and Dr. S. Subramanian21Assistant Professor/Department of ECE, Sri Krishna College of Engineering and Technology,Coimbatore. E-mail: balajivrresearch@gmail.com2Advisor, Coimbatore Institute of Engineering and Technology, Coimbatore. E-mail:drsraju49@gmail.comAbstract : Speech enhancement has become an essentialissue within the field of speech and signal processing,because of the necessity to enhance the performance of voicecommunication systems in noisy environment. There has beena number of research works being carried out in speechprocessing but still there is always room for improvement.The main aim is to enhance the apparent quality of thespeech and to improve the intelligibility. Signalrepresentation and enhancement in cosine transformation isobserved to provide significant results. Discrete CosineTransformation has been widely used for speechenhancement. In this research work, instead of DCT,Advanced DCT (ADCT) Transform which simultaneous offersenergy compaction along with critical sampling and flexiblewindow switching. In order to deal with the issue of frame toframe deviations of the Cosine Transformations, ADCT isintegrated with Pitch Synchronous Analysis (PSA).Moreover, in order to improve the noise minimizationperformance of the system, Wiener filtering approach is usedin this approach. Thus, a novel ADCT based speechenhancement using improved iterative filtering algorithmintegrated with PSA is used in this approach.Keywords: Wiener Filtering, Advanced Discrete CosineTransform, Pitch Synchronous Analysis, PerceptualEvaluation of Speech Quality.1.IntroductionSpeech enhancement is the technique which enhances thequality of speech signals which are corrupted by adversenoise and channel distortion. Speech enhancement has beenused in a number of applications in recent years [1]. Themain aim of speech enhancement is to enhance the qualityand clarity of the speech signal. A number of techniques havebeen developed for providing better clarity speech signalswhich comprises of the techniques such as spectralsubtraction [2], Wiener filtering [3] and Ephraim Malahfiltering [4].For the past two decades, speech enhancement hasbecome one of the most active researches in the field ofsignal process but still there are no standard techniques forboth speech and noise [5]. Transform domain filters arewidely used in the speech enhancement process. These filterscompute the transform coefficients initially followed by theenhancement process. Finally, the inverse transform must beapplied to attain the ultimate desired speech. A number ofspeech enhancement algorithms largely function in thetransform domain as the speech energy is not present in allthe transform coefficients and it is thus easier to filter off thenoise particularly for the noise-only coefficients. Differenttransforms may require different analysis methods. Forsingle-channel speech enhancement, a number of transformbased algorithms have been investigated in the past. Amongthese, DFT-based algorithms are the most active. Moreover,spectral subtraction algorithm [4] was extended to the Fouriertransform by Boll [5] and became a very widely usedapproach.This paper will focus on DCT during the frame-basedanalysis along with an improved noise reduction filter. Intraditional DCT-based speech enhancement algorithms, thetransform is carried out by a short-term cosine transformwhich is almost the same as Short-Term Fourier Transform(STFT) except that DCT is used rather than DFT. In suchalgorithms, the observed speech is partitioned into fixedoverlapping frames ranging from 50% to 75% and thenprocessed by DCT.Moreover, a noise suppression filter is applied on theDCT coefficients. One of the key differences is that the DCTcoefficients are real, while the DFT coefficients are complexand it consists of a magnitude and phase representation.Without a phase representation DCT coefficients’ magnitudesobtained by a standard window-shift illustrate much highervariation compared to those of DFT for a strictly stationarysignal. This will have influence negatively on the inter-frameapproaches such as the decision directed approach [7] for theassessment of priori SNR.ADCT is widely used in audio processing, where theoverlapping minimizes artifacts from the block boundaries[8]. Hence, in this research work uses ADCT in order toimprove the speech quality. Hence, pitch synchronousanalysis is an efficient key which also helps in offering betterperformance [9]. This would improve the overallperformance of DCT-based speech enhancement algorithmsespecially those using inter-frame techniques. This systemalso incorporates the pitch synchronous processing whichwill be improved by a Maximum Alignment technique.ISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14Thus, an advanced speech enhancement system namelyAdvanced Discrete Cosine Transform Speech Enhancementthrough Wiener Filtering based Pitch Synchronous Analysis(ADCT based WFPS) is proposed in this approach.2.Literature SurveyA number of DFT based techniques only concentrates tofilter the spectral magnitude while leaving the noisecorrupted phase information intact, as it has been reportedthat the best estimate of the phase is the corrupted one itself[7]. DCT can attain a higher upper bound than DFT since nosuch action generally results in an upper bound on themaximum possible improvement in Signal-To-Noise Ratio(SNR) [6]. DFT only creates about half the independentspectral constituents as the other half are complex conjugates,while DCT creates fully independent spectral components.Depending on these benefits, it is also proven that DCT is asuitable choice to the discrete Fourier transform (DFT) forspeech enhancement [10].Pitch synchronous analysis has been earlier used invarious speech signal processing systems such as speechanalysis/synthesis system [11] [12], prosody modificationsystem [13] and speech recognition system [14]. Thefundamental scheme of pitch synchronous processing is toinitially partition the speech signal into pitch periods for thevoiced sounds and into pseudo pitch periods for unvoicedsounds. A number of different processes can then be appliedon the resulting pitch synchronous segments for variousfunctions.Pitch synchronous Overlap Adds (PSOLA) technique isapplied in the time domain and it makes the algorithm to becompetent to control the value of the synthesized pitch andthe duration of the synthesized signal [13]. PSOLA techniquecan also be used in other domains such as frequency domain[15]. Fourier transform is applied on the pitch synchronoussections and the resulting spectra are approximated by apattern of zeros and poles to attain the pitch synchronousdepiction for examining the voiced sounds [16]. Evangelista[17] also uses this pitch synchronous representation andutilizes Wavelet transform on it to attain a new depiction ofpseudo-periodic signal through a regularized oscillatorycomponent and fluctuations. This depiction provides anumber of scales for examining the fluctuations which issuperior to Fourier representation with only one scale.Pitch synchronous speech segments are transferred tolinear prediction residual on which the DCT is applied for resampling the residual signal by truncation or zero padding[18]. DCT is applied as an application tool as DCT isefficient at energy compaction. The energy loss with theDCT-based linear prediction technique is lesser than that withthe direct linear prediction technique and this algorithm isthus superior to the original fundamental algorithm.150Most of the existing research work demonstrates that thepitch synchronous processing assists in minimizing thediscontinuities connected with windowing and it focuses on akey point, which is the pitch period [19]. Pitch synchronousprocessing has been extensively applied in speech processingbut is being rarely been used for the purpose of speechenhancement [20].Line Spectrum Pair (LSP) was first introduced by Itakura[36] [37] as an alternative kind of LPC spectralrepresentation.3.KADCT and WFPS Pitch Synchronous Based SpeechEnhancementThe structure of this proposed speech enhancementsystem is shown in Figure 1. The initial speech frame isfiltered by a noise reduction technique, and then avoiced/unvoiced decision is made. If it contains voicedsignal, the time-shift will be changed to one pitch period.Otherwise, the time-shift will fall back to the original fixedvalue. In this way, the analysis window shift adapts to theunderlying speech properties and it is no longer fixed [21].In order to improve the performance, Advanced DiscreteCosine Transform is used in this approach.Signalrepresentation in ADCT domain has become an active area ofresearch in signal processing. ADCT is being effectively usedin superior quality audio coding due to its uniquecharacteristic features. The main advantage of ADCT is itsenergy compaction capability. Moreover, it also attainscritical sampling, a minimization of block effect and flexiblewindow switching [8].In certain applications such as streaming audio tohandheld devices, it is very essential to have quickimplementations and optimized codec structures. In manycircumstances, it is also efficient to carry out ADCT domainaudio processing such as error concealment, which lessensthe deprivation of subjective audio quality.The above said characteristic features of ADCT motivatedthe application of ADCT in this research work.The direct and inverse ADCT are defined as [23, 24]:(1)Whereis the windowed input signal,is theinput signal of 2N samples.is a window function. Weassume an identical analysis-synthesis time window. Certainlimitations of perfect reconstruction are [25, 26]:ISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14(2)A sine window is widely used in audio coding because itoffers good stop-band attenuation, gives good attenuation ofthe block edge effect and allows perfect reconstruction. Otheroptimized windows can be applied as well [25]. The sinewindow is defined as:(3)in (1) are the IADCT coefficients oftime domain aliasing. It contains151Rectangular window does have some advantages. It has anarrower main-lobe which is able to resolve comparablestrength signals. Besides, one advantage of using the DCT ascompared to DFT is that there is no discontinuity problemcaused by rectangular window at the endpoints, since DCT isbased on an even symmetrical extension during the transformof a finite signal.Therefore, the selection of the window is based on atradeoff between spectral resolution and leakage effects. Inthe literature of DCT-based speech enhancement algorithms,the Hann window is very popular [6]. There is also acompromised adoption with trapezoidal window beingapplied in [22]. In this paper, rectangular window is used forbetter performance of the system with Advanced DiscreteCosine Transform.Wiener Filtering(4)Noisy SpeechWindowsShift (initially me-shift FixedintervalTime-shift One PeriodIKDCTThis paper proposes a smoothed noise update techniquethat uses the estimated signal spectrum for subsequent signalestimation. It leads to a more efficient result than the softdecision based noise estimate found in literature. Further, theWF performance is improved using codebook constraints inthe LAR domain instead of LSP domain.The Wiener filter is a popular statistical approach basedon the assumption that signal and the noise are stationarylinear stochastic processes with known spectralcharacteristics that has been used for noise reduction inspeech signal. Assuming that the clean speech, s(t), degradedby ad additive noise, w(t). The noisy speech, x(t) is definedas [27]Wiener FilteringUnvoicedSoundWiener filter has been proven to be the optimal filter forthe real transform in mean square error (MSE) sense. Duringthe implementation, it fully depends on the estimation of thea priori SNR. The a priori SNR can be computed by manyways among which the decision-directed approach [7] iswidely used. Let the noisy speech, clean speech and noisesignal be denoted asand , respectively, and their ADCTrepresentations are,andwhere is the timeframe index and is the frequency index.VoicedSoundPitch SynchronousAnalysisWiener filter is an optimal filter that minimize the MeanSquared Error (MSE). In case of Eq.(10), the filter can bedefined asOverlap &ADDEnhancedSpeechFigure 1: Block diagram of the ADCT based WFPSWhere ω is the frequency index and S(ω), X(ω) and H(ω)are the discrete Fourier transform of clean speech, noisyspeech and that of the Wiener filter respectively. The MSE isdefined as follows. The error is defined as:Windowing FunctionISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14The Mean Squared Error of Eq.(10) is defined as:Where E[.] stands for expectation operator. To minimizethe MSE, the Wiener filter can be estimatedWhere,are the power spectra of noisyspeech and cross power spectra between noisy speech andclean speech respectively.If there is no correlation between the speech signal s(t)and additive noise w(t), the power spectrum of the noisyspeech and the cross power spectrum can be transformed as:Consequently, the Wiener filter can be derived as follows:The SNR is defined by152is needed. A Spectral Subtraction based Initialization (SSI)method is proposed to deal with the above issues. For eachand every frame, power spectral subtraction [29] isperformed to obtain the enhanced speech estimate. FollowingLPC analysis, the above estimate giveswhich determines. It is obvious thatis better than starting with aunity WF and it results in better convergence properties ofCIWF.Robust Parameter Domain SearchThe significance of WF lies in approximating theoptimum filter by means of a codebook of clear speechvectors. Hence, the parameter space utilized to denote thesevectors has a considerable bearing on the successiveapproximations. Line Spectral Frequencies (LSF), ReflectionCoeffecients (RC) and Log Area Ratios (LAR) have a one-toone mapping but they also have different clustering attributesdue to the non-linear relationships between them. Hence,each has been used with varied success in speech coding andrecognition. In this work, a number of different parameterspaces are explored for WF to discover the best performingparameter. The widely used IS distance measure is used forcreating LPC codebooks. The Eucledean Distance (ED) isused for LAR and RC codebooks. For LSPs, ED and othertwo perception based weighted Eucledean distances such asthe Mel-Frequency Warping (MFW) based distance which ismodeled on the auditory system and the Inverse HarmonicMean (IHM) based distance are presented. IHM baseddistance is perceptually appropriate as it weighs each LSF inthe inverse proportion of its nearness to its neighbors becauseof the improved possibility of it denoting formants [30].The estimated a priori SNR, can be expressed as follows:(7)The definition can be incorporated to the Wiener filterequation as follows:Spectral Subtraction Based Initialization (SSI)For each frame in sequential MAP calculation, a set ofinitial values for vector a denoted as is assumed based onwhich the speech vector is calculated through the Wienerfilter. The current estimate is in turn used to estimate thenext estimate of a. This procedure is continued untilconvergence is achieved. In [28],is started as unitywhich is highly suboptimum. Therefore, it results in twopossibilities. The first possibility is that the iterations mightconverge in such a way that the resulting filter is notperceptually the best. The second possibility is that, thoughthey do converge to an optimal filter, number of iterationstaken for convergence will be large. Hence, an initializationtechnique which provides efficient and quicker convergencewhereis the estimated clean speech in theis theprevious frame, max is the maximum function andnoise variance which equals to the expectation of the powermagnitude of the noise signal,. The noisevariance is assumed to be known since noise signal is a widesense stationary random process and can be computed duringthe silence period. In (1), the parameter is used to set aproportion of contributions from the previous frames to thecurrent estimate. In Fourier transform domain, the value of isnormally set to 0.98 which is an empirically obtained valueand is known to be a good tradeoff between noise reductionand speech distortion. The same value of is also commonlyused in DCT speech enhancement schemes [20]. However,this might not be proper for the new situation since DCTcoefficients may require a different value of or even anadaptive one.In DFT domain, there are some work about adapting forbetter estimation of the a priori SNR [31]. Thus, it isISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14workable to propose an adaptive for decision-directedapproach in DCT domain which leads to an improved versionof Wiener filter. The minimum mean square error (MMSE)criterion is used to derive the optimal expression. Recall thedecision-directed approach in (1), the a priori SNR can beexpressed as(8)Whereis an adaptive version of ,and. Then the errorbetween the estimated a priori SNRisand the real one153approximate value ofcan be obtained by substitutingwithwhich is defined as follows:(14)is a low-passWhere is the convolution operator,filter, and a Gaussian mask is applied here to realize thissmoothing function of. The reason for applying thislow-pass filter is that it is able to reduce the variance amongdifferent speech frames which are caused by noise. Thisannoying effect can be further reduced by a “moving” valueof(9)Ifis set tocan be rewritten aswhich is reasonable, then (9)(10)Based on the assumption that DCT coefficient of speechsignaland noise signalcan be modeled as zeromean random Gaussian variables which are independent ofeach other [6],can be expressed as(11)is used in (11) based on the assumption that the DCTcoefficient of speech signalhas a Gaussiandistribution. Incorporating (10) and (11), the error can befinally obtained by(15)represents a parameter which is fixed to 0.5 for theexperimental evaluations. From the above equation SNRchanges slowly, the parameterwill be a value close toone. If the SNR has sharp changes, the parameter will take asmaller value enabling to change adaptively. Thus, theadaptive controller is in the range of zero to one.Pitch SynchronizationIn order to implement the ADCT based WFPS algorithm,the pitch period should be extracted first. There are manyways to estimate the pitch periodicity of a speech signal.From periodicity in time or from frequently spacedharmonics in frequency domain the pitch can be predicted. Atime domain pitch estimator needs a preprocessor to filter andmake simpler the signal through data reduction, basic pitchestimator and a post processor to correct errors.The autocorrelation approach is mainly used in timedomain method for calculating pitch period of a speech signal[38]. For a discrete signal x(n), the autocorrelation function is(12)(16)Equatingcan be obtained asto zero, the optimal expression of(13)The approximation used above is to avoid division byzero. Asis unknown, (7) cannot be applied directly. Anwhere N is the length of analyzed sequence and M0 is thenumber of autocorrelation points to be computed. For pitchdetection assumeis periodic sequence, that isfor all n, it is shown that the autocorrelationfunction is also periodic with the same period,. On the contrary, the periodicity in theautocorrelation function point out periodicity in the signal.For a non-stationary signal like speech, the longtimeautocorrelation is calculated from (16). Generally operatewith short speech segments, consisting of finite number ofISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14samples so the autocorrelation based PDAs short-timeautocorrelation function is as below.154the harmonic part of the signal and requires no additionaloperations [39].(17)The variable m in (17) is called delay and the pitch isequal to the value of m which results in the maximum R(m).In the proposed approach the pitch period is calculated usingautocorrelation method.The final enhanced speech is obtained by overlap addprocess. Actually, this process is a little different from theoriginal process due to the adaptive window shifting. Aconvenient solution is to produce a weighting function whichrecords all the windows frame by frame and calculates the netweighting function. The weighting function can be calculatedfrom the current and the previous frames and hence can beperformed in real time. Thereafter, the enhanced speech hasto be normalized by the weighting function.where n is the discrete time index,is thefundamental frequency,andare the amplitudesis theand the initial phases of the harmonics andstochastic component.4.Experimental ResultsFor this experimental setup, a hundred different segmentsof speeches (half females and half males), are randomlychosen from the TIMIT database. They are resampled at 8kHz and corrupted by three additive noise types includingwhite noise, fan noise and car noise. The total speechduration of all these test speech segments is 313.998 sincluding the silence period. Approximately 50% of thespeech segments are classified as voiced speech.A pitch mark location method is modified for signals withvarying fundamental frequency. The analysis stage intend toiteratively collecting sound samples from the input signal atequally spaced fundamental frequencies.The proposed ADCT based WFPS technique is evaluatedusing two objective measures, segmental SNR (SegSNR)measure and perceptual evaluation of speech quality (PESQ)measure. Since SegSNR is better correlated with meanopinion score (MOS) than SNR as indicated by [33] and iseasy to implement and it has been widely used to qualify theenhanced speech. The implementation in [34] is adopted heresuch that each frame with segmental SNR is thresholded by adB lower bound and a 35 dB higher bound. The segmentalSNR is defined by [20](18)(20)A New Pitch Synchronous OverLap and Add (PSOLA)Approach to enhance the pitch synchronous analysiswhere M is the total number of sound samples to extract.1. Calculate the growth of the fundamental frequencyof the input signal. It is hard to detect, this step is performedby calculate approximatelythe evolution of the mostenergetic harmonic,.2. This component is got from the input signal with aselective time varying passband filter. The central frequencyof the filter is noted at every sample to match the localapproximation. Preferably, the resulting signal is asingle sinusoid modulated in frequencyand amplitudeaccording to theevolution and remains in phasewith the input signal.3. Pitch marks are placed so for recallingthese are placed in the input signal at the intial level foreveryperiod of the filtered signal obtained in step 2. Foreach frequency, a single pitch mark is chosen as the oneequivalent to the closest fundamental.4. For each selected pitch mark, a sound sample isextracted from the input signal with an suitable temporalwindow. The additive noiseis naturally extracted withWhere represents the set of frames that contain speechand its cardinality. PESQ which is described in ITU-Trecommendation P.862 and is also published in [35] is anobjective measurement tool that predicts the results ofsubjective listening tests on telephony systems. It uses asensory model to compare the original, unprocessed signalswith the enhanced signals. In [34] it is indicated that theSegSNR is a better evaluation in terms of noise reduction,while the PESQ is more accurate in terms of speechdistortion prediction. The latter is also more reliable andhighly correlated with MOS as compared to other traditionalobjective measures. In most situations, PESQ is the bestobjective indicator for overall quality of enhanced speech.Before evaluating the ADCT based IWFPS system, theeffects of window functions should be presented. IterativeWiener filter with fixed time-shift analysis of 8 ms is utilized.Two different window functions, rectangular window andHann window are used to truncate the input signal. Thewindow length is fixed to 32 ms. SegSNR and PESQ resultsare shown in Figs. 2 and 3, respectively. From these twofigures, it is clear that rectangular window is better for DCTISBN 978-93-83459-32-2 2014 Bonfring

International Conference on Security and Authentication - SAPIENCE14155Segmental SNR (dB)PESQ ScoreHannRectangular1053.532.5051015Input SNR (dB)020(a) White NoiseFan Noise2515501015Input SNR (dB)20(b) Fan NoiseCar25NoiseHann20Rectangular1510500101520Input SNR (dB)(c) Car NoiseFigure 2: Segmental SNR results of noisy speech,Wiener filtered speech with rectangular window and HannWindow5White Noise43.532.521.51HannRectangular0510Input SNR (dB)10Input SNR (dB)15201520Car ental SNR (dB)HannRectangularWhite Noise150PESQ Score (dB)FanNoise4PESQ ScoreSegmental SNR (dB)based noise reduction algorithms. For all the noise typestaken for consideration, rectangular window is observed toprovide better Segmental SNR.43.530101520Input SNR (dB)Figure 3: PESQ Score results of noisy speech,Wiener filtered speech with rectangular window and HannWindowTo exhibit the advantages of each component of theproposed ADCT based WFPS system, three speechenhancement schemes are compared. The first approach isWiener filtering with a higher fixed overlap which can bedenoted as WFHO. The second one is the pitch-synchronizedWiener filtering named as PSWF. The third approach is theAdaptive Time-Shift Analysis speech (ATSA) approach.Table 1 shows the comparison of SegSNR results.The comparison is carried out for three noise types such asWhite noise, Fan noise and Car noise. The Input SNR takenfor experimentation are 0, 5, 10 and 15. For white noise, theproposed ADCT based WFPS provides efficient SEGSNRfor all the SNR input values taken for consideration.Similarly for the other noise types, the proposed ADCT basedWFPS approach outperforms the other approaches taken forcomparison.TABLE ICOMPARISON OF SEGSNR 718.018.18ISBN 978-93-83459-32-2 2014 Bonfring5

International Conference on Security and Authentication - 2.4011.5510.33TABLE IICOMPARISON OF PESQ RESULTSSNRPESQ ( )(dB) WFHO DCTATSA .472.231.203.612.411.273.702.511.33Table 2 shows the performance comparison of theproposed speech enhancement approach with otherapproaches suchs as WFHO, DCT based PSWF and ATSA interms of PESQ score.It is observed that the proposedADCT based WFPS approach provides better5. ConclusionThis research work focuses on developing an efficientspeech enhancement technique. DCT based speechenhancement approaches are observed to produce betterresults. In conventional DCT-based noise reductionalgorithms, the observed speech signal is partitioned intofixed overlapping frames and transformed into DCT domainwhich results in variation of DCT coefficients from oneframe to another due to non-ideal analysis window positions.In order to improve the overall performance, AdvancedDiscrete Cosine Transform is integrated with pitchsynchronous analysis technique. Wiener filtering is used inthis approach for better performance. The autocorrelationfunction is used for detecting the pitch period which is in turnused as the amount of shift for the analysis window.Therefore, a consistent DCT spectrogram is generated forbetter noise reduction filtering. This technique can be furtherimproved by maximum alignment which results in a muchbetter fit to the DCT basis functions. This proposed approachis called ADCT based WFPS which produces good qualityenhanced speech. Two objective measures, segmental SNRand PESQ are utilized to evaluate the proposed system.References[1] Ephraim Y. & Cohen, I. (2006). “Recent advancementsin speech enhancement,” in The Electrical EngineeringHandbook. Boca Raton, FL: CRC.156[2] Deller, J.R., Hansen, J.H.L., Proakis, J.G., (2000).Discrete-Time Processing of Speech Signals, second ed.IEEE Press, New York.[3] Haykin, S., (1996). Adaptive Filter Theory, third ed.Prentice Hall, Upper Saddle River, New Jersey.[4] Ephraim, Y., Malah, D., (1984). Speech enhancementusing a minimum mean-square error short-time spectralamplitude estimator. IEEE Trans. Acoust. SpeechSignal Process. ASSP-32 (6), 1109–1121.[5] Attias, H., Platt, J. C., Acero, A. & Deng, L. (2000).“Speech denoising and dereverberation usingprobabilistic models,” in Proc. NIPS, pp. 758–764.[6] Soon, I. Y., Koh, S. N. & Yeo, C. K. (1998). “Noisyspeech enhancement using discrete cosine transform,”Speech Commun., vol. 24, pp. 249–257.[7] Ephraim Y. & Malah, D. (1984).“Speech enhancementusing a minimum mean-square error short-time spectralamplitude estimator,” IEEE Trans. Acoust., Speech,Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121.[8] Ye Wang & Miikka Vilermo, (2002). “The ModifiedDiscrete Cosine Transform: Its Implications For AudioCoding And Error Concealment”, AES 22ndInternational Conference on Virtual, Synthetic andEntertainment Audio.[9] M. V. Mathews, J. E. Miller, and E. E. D. Jr, “Pitchsynchronous analysis of

main aim of speech enhancement is to enhance the quality and clarity of the speech signal. A number of techniques have been developed for providing better clarity speech signals which comprises of the techniques such as spectral subtraction [2], Wiener filtering [3] and Ephraim Malah filtering [4]. For the past two decades, speech enhancement has

Related Documents:

Speech enhancement based on deep neural network s SE-DNN: background DNN baseline and enhancement Noise-universal SE-DNN Zaragoza, 27/05/14 3 Speech Enhancement Enhancing Speech enhancement aims at improving the intelligibility and/or overall perceptual quality of degraded speech signals using audio signal processing techniques

Speech Enhancement Speech Recognition Speech UI Dialog 10s of 1000 hr speech 10s of 1,000 hr noise 10s of 1000 RIR NEVER TRAIN ON THE SAME DATA TWICE Massive . Spectral Subtraction: Waveforms. Deep Neural Networks for Speech Enhancement Direct Indirect Conventional Emulation Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech

Keywords: Speech Enhancement, Spectral Subtraction, Adaptive Wiener Filter . 1 INTRODUCTION. Speech enhancement is one of the most important topics in speech signal processing. Several techniques have been proposed for this purpose like the spectral subtraction approach, the signal subspace approach, adaptive noise canceling

component for speech enhancement . But, recently, the [15] phase value also considered for efficient noise suppression in speech enhancement [5], [16]. The spectral subtraction method is the most common, popular and traditional method of additive noise cancellation used in speech enhancement. In this method, the noise

channel speech enhancement in the time domain. Traditional monaural speech enhancement approaches in-clude spectral subtraction, Wiener filtering and statistical model-based methods [1]. Speech enhancement has been extensively studied in recent years as a supervised learning This research was supported in part by two NIDCD grants (R01DC012048

2 The proposed BDSAE speech enhancement method In this section, we first present conventional spectral ampli-tude estimation scheme for speech enhancement. Then, the proposed speech enhancement scheme based on Bayesian decision and spectral amplitude estimation is described. Finally, we derive the optimal decision rule and spectral

modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs. Key words: Speech .

o Academic Writing , Stephen Bailey (Routledge, 2006) o 50 Steps to Improving your Academic Writing , Christ Sowton (Garnet, 2012) Complete introduction to organising and writing different types of essays, plus detailed explanations and exercises on sentence structure and linking: Writing Academic English , Alice