Speech Enhancement Using A Minimum Mean-square Error Short-time .

1y ago

11 Views

2 Downloads

2.78 MB

24 Pages

Last View : 4d ago

Last Download : 3m ago

Upload by : Julius Prosser

Report this link

Download PDF

Transcription

Available online at www.sciencedirect.comSpeech Communication 54 (2012) 282–305www.elsevier.com/locate/specomSpeech enhancement using a minimum mean-square errorshort-time spectral modulation magnitude estimatorKuldip Paliwal, Belinda Schwerin , Kamil WójcickiSignal Processing Laboratory, Griﬃth School of Engineering, Griﬃth University, Nathan, QLD 4111, AustraliaReceived 15 December 2010; received in revised form 7 September 2011; accepted 14 September 2011Available online 24 September 2011AbstractIn this paper we investigate the enhancement of speech by applying MMSE short-time spectral magnitude estimation in the modulation domain. For this purpose, the traditional analysis-modiﬁcation-synthesis framework is extended to include modulation domainprocessing. We compensate the noisy modulation spectrum for additive noise distortion by applying the MMSE short-time spectral magnitude estimation algorithm in the modulation domain. A number of subjective experiments were conducted. Initially, we determine theparameter values that maximise the subjective quality of stimuli enhanced using the MMSE modulation magnitude estimator. Next, wecompare the quality of stimuli processed by the MMSE modulation magnitude estimator to those processed using the MMSE acousticmagnitude estimator and the modulation spectral subtraction method, and show that good improvement in speech quality is achievedthrough use of the proposed approach. Then we evaluate the eﬀect of including speech presence uncertainty and log-domain processingon the quality of enhanced speech, and ﬁnd that this method works better with speech uncertainty. Finally we compare the quality ofspeech enhanced using the MMSE modulation magnitude estimator (when used with speech presence uncertainty) with that enhancedusing diﬀerent acoustic domain MMSE magnitude estimator formulations, and those enhanced using diﬀerent modulation domain basedenhancement algorithms. Results of these tests show that the MMSE modulation magnitude estimator improves the quality of processedstimuli, without introducing musical noise or spectral smearing distortion. The proposed method is shown to have better noise suppression than MMSE acoustic magnitude estimation, and improved speech quality compared to other modulation domain based enhancement methods considered.Ó 2011 Elsevier B.V. All rights reserved.Keywords: Modulation domain; Analysis-modiﬁcation-synthesis (AMS); Speech enhancement; MMSE short-time spectral magnitude estimator (AME);Modulation spectrum; Modulation magnitude spectrum; MMSE short-time modulation magnitude estimator (MME)1. IntroductionSpeech enhancement methods aim to improve the quality of noisy speech by reducing noise, while at the sametime minimising any speech distortion introduced by theenhancement process. Many enhancement methods arebased on the short-time Fourier analysis-modiﬁcation-synthesis framework. Some examples of these are the spectralsubtraction method (Boll, 1979), the Wiener ﬁlter method(Wiener, 1949), and the MMSE short-time spectral amplitude estimation method (Ephraim and Malah, 1984). Corresponding author. Tel.: 61 7 3735 3754; fax: 61 7 3735 5198.E-mail address: belinda.schwerin@griﬃthuni.edu.au (B. Schwerin).0167-6393/ - see front matter Ó 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.specom.2011.09.003Spectral subtraction is perhaps one of the earliest andmost extensively studied methods for speech enhancement.This simple method enhances speech by subtracting a spectral estimate of noise from the noisy speech spectrum ineither the magnitude or energy domain. Though thismethod is eﬀective at reducing noise, it suﬀers from theproblem of musical noise distortion, which is very annoying to listeners. To overcome this problem, Ephraim andMalah (1984) proposed the MMSE short-time spectralamplitude estimator, referred to throughout this work asthe acoustic magnitude estimator (AME). In the literature(e.g., Cappe, 1994; Scalart and Filho, 1996), it has beensuggested that the good performance of the AME can belargely attributed to the use of the decision-directed

K. Paliwal et al. / Speech Communication 54 (2012) 282–305approach for estimation of the a priori signal-to-noise ratio(a priori SNR). The AME method, even today, remains oneof the most eﬀective and popular methods for speechenhancement.Recently, the modulation domain has become popularfor speech processing. This has been in part due to thestrong psychoacoustic and physiological evidence, whichsupports the signiﬁcance of the modulation domain forthe analysis of speech signals.1 Zadeh (1950) was perhapsthe ﬁrst to propose a two-dimensional bi-frequency system,where the second dimension for frequency analysis was thetransform of the time variation of the magnitudes at eachstandard (acoustic) frequency. Atlas et al. (2004) morerecently deﬁnes the acoustic frequency as the axis of theﬁrst short-time Fourier transform (STFT) of the input signal and the modulation frequency as the independent variable of the second STFT transform.Early eﬀorts to utilise the modulation domain for speechenhancement assumed speech and noise to be stationary,and applied ﬁxed ﬁltering on the trajectories of the acousticmagnitude spectrum. For example, Hermansky et al.(1995) proposed band-pass ﬁltering the time trajectoriesof the cubic-root compressed short-time power spectrumto enhance speech. Falk et al. (2007) and Lyons and Paliwal (2008) applied similar band-pass ﬁltering to the timetrajectories of the short-time magnitude (power) spectrumfor speech enhancement.However, speech and possibly noise are known to benonstationary. To capture this nonstationarity, one optionis to assume speech to be quasi-stationary, and process thetrajectories of the acoustic magnitude spectrum on a shorttime basis. At this point it is useful to diﬀerentiate theacoustic spectrum from the modulation spectrum as follows. The acoustic spectrum is the STFT of the speech signal, while the modulation spectrum at a given acousticfrequency is the STFT of the time series of the acousticspectral magnitudes at that frequency. The short-timemodulation spectrum is thus a function of time, acousticfrequency and modulation frequency.This type of short-time processing in the modulationdomain has been used in the past for automatic speech recognition (ASR). Kingsbury et al. (1998), for example,applied a modulation spectrogram representation thatemphasised low-frequency amplitude modulations toASR for improved robustness in noisy and reverberantconditions. Tyagi et al. (2003) applied mel-cepstrum modulation features to ASR to give improved performance inthe presence of non-stationary noise. Short-time modulation domain processing has also been applied to objectivequality. For example, Kim and Oct (2004, 2005) as wellas Falk and Chan (2008) used the short-time modulationmagnitude spectrum to derive objective measures thatcharacterise the quality of processed speech.1A review of the signiﬁcance of the modulation domain for humanspeech perception can be found in (Atlas and Shamma, 2003).283For speech enhancement, short-time modulationdomain processing was recently applied in the modulationspectral subtraction method (ModSSub) of Paliwal et al.(2010). Here, the spectral subtraction method was extendedto the modulation domain, enhancing speech by subtracting the noise modulation energy spectrum from the noisymodulation energy spectrum in an analysis-modiﬁcationsynthesis (AMS) framework. In ModSSub method, theframe duration used for computing the short-time modulation spectrum was found to be an important parameter,providing a trade-oﬀ between quality and level of musicalnoise. Increasing the frame duration reduced musical noise,but introduced a slurring distortion. A somewhat longframe duration of 256 ms was recommended as a goodcompromise. The disadvantages of using longer modulation domain analysis window are as follows. Firstly, weare assuming stationarity which we know is not the case.Secondly, quite a long portion is needed for the initial estimation of noise, and thirdly, as shown by Paliwal et al.(2011), speech quality and intelligibility is higher whenthe modulation magnitude spectrum is processed usingshort frame durations and lower when processed usinglonger frame durations. For these reasons, we aim to ﬁnda method better suited to the use of shorter modulationanalysis window durations.Since the AME method has been found to be more eﬀective than spectral subtraction in the acoustic domain, inthis paper, we explore the eﬀectiveness of this method inthe short-time modulation domain. For this purpose, thetraditional analysis-modiﬁcation-synthesis framework isextended to include modulation domain processing, thenthe noisy modulation spectrum is compensated for additivenoise distortion by applying the MMSE short-time spectralmagnitude estimation algorithm. The advantage of applying a MMSE-based method is that it does not introducemusical noise and hence can be used with shorter framedurations in the modulation domain. The proposedapproach, referred to as the modulation magnitude estimator (MME), is demonstrated to give better noise removalthan the AME approach, without the musical noise ofthe spectral subtraction type approach, or the spectralsmearing of the ModSSub method. In the body of thispaper, we provide enhancement results for the case ofspeech corrupted by additive white Gaussian noise(AWGN). We have also investigated enhancement performance for various coloured noises and the results, includedin the Appendices, are shown to be qualitatively similar.The rest of the paper is organised as follows. Section 2details an AMS-based framework for enhancement in theshort-time modulation domain. In Section 3 we describethe proposed MME approach, then in Section 4 we givedetails of the experiments used to tune the parameters ofthe MME method. In Section 5, the performance of theMME method is evaluated by comparison to a numberof diﬀerent speech enhancement approaches. In Section 6,we consider the eﬀect of speech presence uncertainty andlog-domain processing on the performance of the MME

284K. Paliwal et al. / Speech Communication 54 (2012) 282–305method. In Sections 7 and 8, we compare the quality of theproposed MME method to a wider range of enhancementmethods, including diﬀerent acoustic domain MMSE formulations and a number of modulation domain basedspeech enhancement methods. Final conclusions are drawnin Section 9.2. AMS-based framework for speech enhancement in theshort-time spectral modulation domainAs mentioned previously, many frequency domainspeech enhancement methods are based on the (acoustic)short-time Fourier AMS framework (e.g., Lim and Oppenheim, 1979; Berouti et al., 1979; Ephraim and Malah, 1984;Ephraim and Malah, 1985; Martin, 1994; Sim et al., 1998;Virag, 1999; Cohen, 2005; Loizou, 2005). A traditionalacoustic AMS procedure for speech enhancement consistsof three stages: (1) the analysis stage, where the noisyspeech is processed using the STFT analysis; (2) the modiﬁcation stage, where the noisy spectrum is compensated fornoise distortion to produce the modiﬁed spectrum; and (3)the synthesis stage, where an inverse STFT operation is followed by overlap-add synthesis to reconstruct theenhanced signal. The above framework has recently beenextended to facilitate enhancement in the short-time spectral modulation domain (Paliwal et al., 2010). For this purpose, a secondary AMS procedure was utilized forframewise processing of the time series of each frequencycomponent of the acoustic magnitude spectra. In this section, the details of the AMS-based framework for speechenhancement in the short-time spectral modulation domainare brieﬂy reviewed.Let us assume an additive noise model in which cleanspeech is corrupted by uncorrelated additive noise to produce noisy speech as given byxðnÞ ¼ sðnÞ þ dðnÞ;ð1Þwhere x(n), s(n), and d(n) are the noisy speech, cleanspeech, and noise signals, respectively, and n denotes a discrete-time index. The noisy speech signal is then processedusing the running STFT analysis (Vary and Martin, 2006)given byX l ðkÞ ¼N 1Xxðn þ lZÞvðnÞe j2pnk N ;ð2Þet al., 2001; Loizou, 2007; Paliwal and Wójcicki, 2008;Rabiner and Schafer, 2010).In polar form, the STFT of the speech signal can beexpressed asX l ðkÞ ¼ jX l ðkÞjej\X l ðkÞ ;where jXl(k)j denotes the acoustic magnitude spectrum and\Xl(k) denotes the acoustic phase spectrum. The time trajectories for each frequency component of the acousticmagnitude spectra are then processed framewise using asecond AMS procedure as outlined below. The runningSTFT is used to compute the modulation spectrum fromthe acoustic magnitude spectrum as followsX ‘ ðk; mÞ ¼2Note that frame duration and window duration mean the same thingand we use these two terms interchangeably in this paper.N 1XjX lþ‘Z ðkÞjuðlÞe j2plm N ;ð4Þl¼0where ‘ is the modulation frame index, k is the index of theacoustic frequency, m refers to the index of the modulationfrequency, N is the modulation frame duration (MFD) interms of acoustic frames, Z is the modulation frame shift(MFS) in terms of acoustic frames, and u(l) is the modulation analysis window function. The modulation spectrumcan be written in polar form asX ‘ ðk; mÞ ¼ jX ‘ ðk; mÞjej\X ‘ ðk;mÞ ;ð5Þwhere jX ‘ ðk; mÞj is the modulation magnitude spectrum,and \X ‘ ðk; mÞ is the modulation phase spectrum. In thepresent work, the modulation magnitude spectrum of cleanspeech is estimated from the noisy modulation magnitudespectrum, while the noisy modulation phase spectrum isleft unchanged.3 The modiﬁed modulation spectrum is thengiven by b j\X ‘ ðk;mÞY ‘ ðk; mÞ ¼ S;ð6Þ‘ ðk; mÞ e b where S‘ ðk; mÞ is an estimate of the clean modulationmagnitude spectrum. Eq. (6) can also be written in termsof spectral gain function, G‘ ðk; mÞ, applied to the modulation spectrum of noisy speech as followsY ‘ ðk; mÞ ¼ G‘ ðk; mÞX ‘ ðk; mÞ;ð7ÞwhereG‘ ðk; mÞ ¼n¼0where l refers to the acoustic frame index, k refers to theindex of the acoustic frequency, N is the acoustic frameduration2 (AFD) in samples, Z is the acoustic frame shift(AFS) in samples, and v(n) is the acoustic analysis windowfunction. In speech processing, an AFD of 20–40 ms alongwith an AFS of 10–20 ms and the Hamming analysis window are typically employed (e.g., Picone, 1993; Huangð3Þ b S ‘ ðk; mÞ jX ‘ ðk; mÞj:ð8ÞThe inverse STFT operation, followed by least-squaresoverlap-add synthesis (Quatieri, 2002), are then used tocompute the modiﬁed acoustic magnitude spectrum asgiven by3The relative importance of the modulation phase spectrum with respectto the modulation magnitude spectrum depends on the MFD. Forexample, the results of a recent study by Paliwal et al. (2011) suggest thatfor short MFDs (664 ms) the modulation phase spectrum does notsigniﬁcantly contribute towards speech intelligibility or quality.

K. Paliwal et al. / Speech Communication 54 (2012) 282–305285add synthesis, to the modiﬁed acoustic spectrum as givenby()N 1XXj2pðn lZÞk Nwðn lZÞY l ðkÞe:ð11ÞyðnÞ ¼lk¼0A block diagram of the AMS-based framework for speechenhancement in the short-time spectral modulation domainis shown in Fig. 1.3. Minimum mean-square error short-time spectralmodulation magnitude estimatorThe minimum mean-square error short-time spectralamplitude estimator of Ephraim and Malah (1984) hasbeen employed in the past for speech enhancement in theacoustic frequency domain with much success. In the present work we investigate its use in the short-time spectralmodulation domain. For this purpose the AMS-basedframework detailed in Section 2 is used. In the followingdiscussions we will refer to the original method by Ephraimand Malah (1984) as the MMSE acoustic magnitude estimator (AME), while the proposed modulation domainapproach will be referred to as the MMSE modulationmagnitude estimator (MME). The details of the MMEare presented in the remainder of this section.In the MME method, the modulation magnitude spectrum of clean speech is estimated from noisy observations.The proposed estimator minimises the mean-square errorbetween the modulation magnitude spectra of clean andestimated speech 2 b ¼ E jS ‘ ðk; mÞj S ‘ ðk; mÞ ;ð12ÞFig. 1. Block diagram of the AMS–based framework for speechenhancement in the short-time spectral modulation domain.jY l ðkÞj ¼X(wðl ‘ZÞ‘N 1X)Y ‘ ðk; mÞej2pðl ‘ZÞm N;ð9Þm¼0where w(l) is a synthesis window function. The modiﬁedacoustic magnitude spectrum is combined with the noisyacoustic phase spectrum,4 to produce the modiﬁed acousticspectrum as followsY l ðkÞ ¼ jY l ðkÞjej\X l ðkÞ :ð10ÞThe enhanced speech signal is constructed by applying theinverse STFT operation, followed by least-squares overlap4Typically, AMS-based speech enhancement methods modify only theacoustic magnitude spectrum while keeping the acoustic phase spectrumunchanged. One reason for this is that for Hamming-windowed frames of20–40 ms duration, the phase spectrum is considered unimportant forspeech enhancement (e.g., Wang and Lim, 1982; Shannon and Paliwal,2006).where E[ ] denotes the expectation operator. Closed formsolution to this problem in the acoustic spectral domainhas been reported by Ephraim and Malah (1984) underthe assumptions that speech and noise are additive in thetime domain, and that their individual short-time spectralcomponents are statistically independent, identically distributed, zero-mean Gaussian random variables. In thepresent work we make similar assumptions, namely that(1) speech and noise are additive in the short-time acousticspectral magnitude domain, i.e.,jX l ðkÞj ¼ jS l ðkÞj þ jDl ðkÞjð13Þand (2) the individual short-time modulation spectral components of S ‘ ðk; mÞ and D‘ ðk; mÞ are independent, identically distributed Gaussian random variables.The reasoning for the ﬁrst assumption is that at highSNRs the phase spectrum remains largely unchanged byadditive noise distortion (Loizou, 2007). For the secondassumption, we can apply an argument similar to that ofEphraim and Malah (1984), where the central limit theorem is used to justify the statistical independence of spectral components of the Fourier transform. For the STFT,this assumption is valid only in the asymptotic sense, that

286K. Paliwal et al. / Speech Communication 54 (2012) 282–305is, when the frame duration is large. However, Ephraimand Malah have used an acoustic frame duration of32 ms in their formulation to get good results. In our useof the MMSE approach in the modulation domain, weshould also make the modulation frame duration to be aslarge as possible, however it must not be so large as to beadversely aﬀected by the nonstationarity of the magnitudespectral sequence as mentioned in the introduction. Keeping Ephraim and Malah’s 32 ms acoustic frame duration inmind, we want to ﬁnd a compromise between these twocompeting requirements. For this, we investigate in thispaper the performance of our method as a function ofmodulation frame duration.With the above assumptions in mind, the modulationmagnitude spectrum of clean speech can be estimated fromthe noisy modulation spectrum under the MMSE criterion(following Ephraim and Malah, 1984) as b ð14Þ S ‘ ðk; mÞ ¼ E½jS ‘ ðk; mÞjjX ‘ ðk; mÞ ¼ G‘ ðk; mÞjX ‘ ðk; mÞjð15Þwhere G‘ ðk; mÞ is the MMSE-MME spectral gain functiongiven bypﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃp m‘ ðk; mÞK½m‘ ðk; mÞ ;G‘ ðk; mÞ ¼ð16Þ2 c‘ ðk; mÞin which m‘(k, m) is deﬁned asm‘ ðk; mÞ ,n‘ ðk; mÞc ðk; mÞ1 þ n‘ ðk; mÞ ‘and K[ ] is the following function hhhK½h ¼ exp ð1 þ hÞI 0þ hI 1;222ð17Þð18Þ2E½jS ‘ ðk; mÞj ð19Þ2E½jD‘ ðk; mÞj andc‘ ðk; mÞ ,jX ‘ ðk; mÞj2E½jD‘ ðk; mÞj2 ; k‘ 1 ðk; mÞþ ð1 aÞ max ½ c‘ ðk; mÞ 1; 0 ;ð21Þwhere a controls the trade-oﬀ between noise reduction andtransient distortion (Cappe, 1994; Ephraim and Malah,21984), k‘ ðk; mÞ is an estimate of k‘ ðk; mÞ , E½jD‘ ðk; mÞj ,and the a posteriori SNR estimate is obtained by2 c‘ ðk; mÞ ¼jX ‘ ðk; mÞj: k‘ ðk; mÞð20Þrespectively.Since in practice only noisy speech is observable, then‘(k, m) and c‘(k, m) parameters have to be estimated. Forthis task we apply the decision-directed approach (Ephraimand Malah, 1984) in the short-time spectral modulationdomain. In the decision-directed method the a prioriSNR is estimated by recursive averaging as followsð22ÞNote that limiting the minimum value of the a priori SNRhas a considerable eﬀect on the nature of the residual noise(Ephraim and Malah, 1984; Cappe, 1994). For this reason,a lower bound nmin is typically used to prevent a priori SNRestimates falling below its prescribed value, i.e.,hi n‘ ðk; mÞ ¼ max n‘ ðk; mÞ; nmin :ð23ÞMany approaches have been employed in the literaturefor noise power spectrum estimation in the acoustic spectral domain (e.g., Scalart and Filho, 1996; Martin, 2001;Cohen and Berdugo, 2002; Loizou, 2007). In the presentwork, spectral modulation domain estimates are needed.For this task a simple procedure is employed, where an initial estimate of modulation power spectrum of noise iscomputed from six leading silence frames.5 This estimateis then updated during speech absence using a recursiveaveraging rule (e.g., Scalart and Filho, 1996; Virag,1999), applied in the modulation spectral domain asfollows k‘ ðk; mÞ ¼ u k‘ 1 ðk; mÞ þ ð1 uÞjX ‘ ðk; mÞj2 ;where I0( ) and I1( ) denote the modiﬁed Bessel functions ofzero and ﬁrst order, respectively. In the above equationsn‘(k, m) and c‘(k, m) are interpreted (after McAulay andMalpass, 1980) as the a priori SNR, and the a posterioriSNR. These quantities are deﬁned asn‘ ðk; mÞ , n‘ ðk; mÞ ¼ a 2 b S ‘ 1 ðk; mÞ ð24Þwhere u is a forgetting factor chosen depending on the stationarity of the noise. The speech presence or absence isdetermined using a statistical model-based voice activitydetection (VAD) algorithm by Sohn et al. (1999), appliedin the modulation spectral domain.64. Subjective tuning of MME parametersOne of the reasons for the good performance of theAME method of Ephraim and Malah (1984) is that itsparameters have been well tuned. In the current work, thisMMSE estimator is applied in the spectral modulationdomain. Consequently, the parameters of the proposedMME method need to be retuned.The adjustable parameters of the MME approachinclude the acoustic frame duration (AFD), acoustic frameshift (AFS), modulation frame duration (MFD), modulation frame shift (MFS), as well as the smoothing parameter5Using six non-overlapped frames in the modulation domain for initialnoise estimation, around 220 ms of leading silence is required.6More speciﬁcally, the decision-directed decision rule without hangover (Sohn et al., 1999) is used.

K. Paliwal et al. / Speech Communication 54 (2012) 282–305a and the lower bound nmin used in a priori SNR estimation. Tuning of some of these parameters can be done qualitatively from our knowledge of speech processing, and canbe ﬁxed without further investigation. For example, speechcan be assumed to be approximately stationary over shortdurations, and therefore acoustic frameworks typically usea short AFD of around 20–40 ms (e.g., Picone, 1993;Huang et al., 2001; Loizou, 2007; Paliwal and Wójcicki,2008), which at the same time is long enough to providereliable spectral estimates. Based on these qualitative reasons, an AFD of 32 ms was selected in this work. We havealso chosen to use a 1 ms AFS to facilitate experimentationwith a wide range of frame sizes and shifts in the modulation domain, and to increase the adaptability of the proposed method to changes in signal characteristics. Forother parameters, subjective listening tests were conductedto determine values that maximise the subjective quality ofstimuli enhanced using the MME method.In the remainder of this section, we ﬁrst describe detailscommon to subsequent experiments. These include thespeech corpus, settings used for stimuli generation andthe listening test procedure. We then present experiments,results, and discussions. This section is concluded with asummary of the tuned parameters.4.1. Speech corpusThe Noizeus speech corpus (Loizou, 2007; Hu and Loizou, 2007)7 was used for the experiments presented in thissection. The corpus contains 30 phonetically-balanced sentences belonging to six speakers (three males and threefemales), and each having an average length of around2.6 s. The recorded speech was originally sampled at25 kHz. The recordings were then downsampled to 8 kHzand ﬁltered to simulate the receiving frequency characteristics of telephone handsets. The corpus includes stimuli withnon-stationary noises at diﬀerent SNRs. For our experiments only the clean stimuli were used. Correspondingnoisy stimuli were generated by degrading the clean stimuliwith additive white Gaussian noise (AWGN) at 5 dB SNR.Since use of the entire corpus was not feasible for humanlistening tests, in our experiments four sentences wereemployed. Of these, two (sp20 and sp22 belonging to amale and female speaker) were used for parameter tuning,while the other two (sp10 and sp26 also belonging to a maleand female speaker) were used in subjective testing.4.2. StimuliThe settings used for the construction of MME stimuliare as follows. The Hamming window was used as boththe acoustic and modulation analysis window functions.The FFT-analysis length was set to 2N and 2N for acoustic7The Noizeus speech corpus is publicly available on-line at thefollowing url: http://www.utdallas.edu/ loizou/speech/noizeus.287and modulation domain processing, respectively. Leastsquares overlap-add synthesis (Quatieri, 2002) was usedfor both acoustic and modulation syntheses. The thresholdfor the statistical voice activity detector (Sohn et al., 1999)was set to 0.15, and the forgetting factor u for noise estimate updates was set to 0.98. The AFD was set to 32 msand the AFS was set to 1 ms. Other parameters used inthe construction of MME stimuli for experiments presented in this section, are as deﬁned in the description ofeach experiment.4.3. Listening test procedureSubjective testing was done in the form of AB listeningtests that determined parameter preference. For each subjective experiment, listening tests were conducted in a quietroom. Participants were familiarised with the task during ashort practice session. The actual test consisted of stimulipairs played back in randomised order over closed circumaural headphones at a comfortable listening level. For eachstimuli pair, the listeners were presented with three labelledoptions on a computer and asked to make a subjective preference. The ﬁrst and second options were used to indicate apreference for the corresponding stimuli, while the thirdoption was used to indicate a similar preference for bothstimuli. The listeners were instructed to use the third optiononly when they did not prefer one stimulus over the other.Pair-wise scoring was used, with a score of 1 awarded tothe preferred treatment, and 0 to the other. For the similarpreference response, each treatment was awarded a score of 0.5. Participants could re-listen to stimuli if required.4.4. Parameter tuning: modulation frame durationTypical modulation domain methods use modulationframe durations (MFDs) of around 250 ms (Greenbergand Kingsbury, 1997; Thompson and Atlas, 2003; Kim,2005; Falk and Chan, 2008; Wu et al., 2009; Falk andChan, 2010; Falk et al., 2010; Paliwal et al., 2010). However, recent experiments (Paliwal et al., 2011) suggest thatshorter MFDs may be better suited (in the context of intelligibility and quality) when processing the modulationmagnitude spectrum. Paliwal et al. (2011) also showed thatobjective quality decreased for increasing MFDs. In thisexperiment we evaluate the eﬀect of MFD on the qualityof stimuli enhanced using the MME method.Enhanced stimuli were created by applying the MMEmethod (see Section 3) to noisy speech (see Section 4.1).Using a MFS of 2 ms, a 0.998, and nmin 25 dB,MFD values of 32, 48, 64, 128 and 256 ms were investigated. The quality of the resulting stimuli was assessedthrough subjective listening tests using the procedure givenin Section 4.3. Five subjects participated in this experiment.Each was presented with 40 comparisons. The sessionlasted approximately 10 min.Mean subjective preference scores as a function of MFDare given in Fig. 2. The results show that use of long MFDs

288K. Paliwal et al. / Speech Communication 54 (2012) 282–305Mean subjective preference score (%)100806040200324864128Modulation frame duration (ms)256Fig. 2. Mean subjective preference scores (%) for stimuli generated using MME with 2 ms MFS, a 0.998, nmin 25 dB, and MFD values of 32, 48, 64,128, and 256 ms.(such as 256 ms) reduce the quality of enhanced stimuli.The reason for this is that long frame durations cause spectral smearing, which can be heard as a reverberant type ofdistortion. On the other hand, use of short MFDs (such as32–64 ms) produce stimuli with higher quality. Use of a32 ms modulation frame duration is acceptable in the modulation domain for reasons similar to those used to justify a32 ms acoustic frame duration by Ephraim and Malah intheir MMSE formulation and discussed in Section 3. It isalso noted that the results of this experiment are consistentwith those reported by Paliwal et al. (2011), where shorterframe durations were found to work better for processingof the modulation magnitude spectrum. Based on theresults of this experiment, a MFD of 32 ms was selectedfor use in the experiments in presented in later sections.4.5. Parameter tuning: modulation frame shiftThe modulation frame shift (MFS) aﬀects the ability ofthe MME method to adapt to changes in the properties ofthe signal, with shorter shifts oﬀering some reduction in theintroduced distortion during more transient parts.However, smaller shifts also add to the computational costof the method.In this experiment, we evaluate the eﬀect of MFS on thesubjective quality of speech corrupted with 5 dB AWGNenhanced with the MME method. For this experiment,MFD is set t

method. In Sections 7and8, we compare the quality of the proposed MME method to a wider range of enhancement methods, including diﬀerent acoustic domain MMSE for-mulations and a number of modulation domain based speech enhancement methods. Final conclusions are drawn in Section 9. 2. AMS-based framework for speech enhancement in the

Related Documents:

Deep Neural Network based Speech Enhancement - ViVoLab

Speech enhancement based on deep neural network s SE-DNN: background DNN baseline and enhancement Noise-universal SE-DNN Zaragoza, 27/05/14 3 Speech Enhancement Enhancing Speech enhancement aims at improving the intelligibility and/or overall perceptual quality of degraded speech signals using audio signal processing techniques

32 Views

1y ago

Creating Deep Learning Based Speech Products in Record Time

Speech Enhancement Speech Recognition Speech UI Dialog 10s of 1000 hr speech 10s of 1,000 hr noise 10s of 1000 RIR NEVER TRAIN ON THE SAME DATA TWICE Massive . Spectral Subtraction: Waveforms. Deep Neural Networks for Speech Enhancement Direct Indirect Conventional Emulation Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech

32 Views

1y ago

Single Channel Speech Enhancement using a Complex Spectrum Method - NAUN

component for speech enhancement . But, recently, the [15] phase value also considered for efficient noise suppression in speech enhancement [5], [16]. The spectral subtraction method is the most common, popular and traditional method of additive noise cancellation used in speech enhancement. In this method, the noise

11 Views

1y ago

Dense CNN with Self-Attention for Time-Domain Speech Enhancement

channel speech enhancement in the time domain. Traditional monaural speech enhancement approaches in-clude spectral subtraction, Wiener ﬁltering and statistical model-based methods [1]. Speech enhancement has been extensively studied in recent years as a supervised learning This research was supported in part by two NIDCD grants (R01DC012048

11 Views

1y ago

Speech enhancement based on Bayesian decision and spectral amplitude ...

2 The proposed BDSAE speech enhancement method In this section, we first present conventional spectral ampli-tude estimation scheme for speech enhancement. Then, the proposed speech enhancement scheme based on Bayesian decision and spectral amplitude estimation is described. Finally, we derive the optimal decision rule and spectral

12 Views

1y ago

Single-channel speech enhancement using spectral subtraction in the ...

modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs. Key words: Speech .

11 Views

1y ago

Comparative Performance Analysis of Speech Enhancement Methods

Keywords: Speech Enhancement, Spectral Subtraction, Kalman filter, Musical noise 1. INTRODUCTION Speech enhancement is used to improve intelligibility and overall perceptual quality of degraded speech using various algorithms and audio signal processing techniques. The aim of speech

13 Views

1y ago

Référentiel Ecocert « En Cuisine

Additif alimentaire : substance qui n’est habituellement pas consommée comme un aliment ou utilisée comme un ingrédient dans l’alimentation. Ils sont ajoutés aux denrées dans un but technologique au stade de la fabrication, de la transformation, de la préparation, du traitement, du conditionnement, du transport ou de l’entreposage des denrées et se retrouvent donc dans la .

298 Views

3y ago

Recent Views

Technological Revolutions and Stock Prices National Bureau of Economic .

stock prices up, but the discount rate eﬀect prevails eventually, pushing the stock prices down. The resulting pattern in the new economy stock prices looks like a bubble but it obtains under rational expectations through a general equilibrium eﬀect. The bubble-like pattern in stock prices arises in part due to an ex post selection bias.

1y ago

121 Views

Dynamic correlation between stock market and oil prices: The case of .

between stock market and oil prices is still growing. Nevertheless, there are very few studies on the dynamic correlation between these two markets. A first approach on the dynamic co-movements between oil prices and stock markets was performed by Ewing and Thomson (2007), using the cyclical components of oil prices and stock prices.

1y ago

122 Views

Prices Effective January 1, 2020 Machine Prices and Speci cations

Prices Effective January 1, 2020 Machine Prices and Speci cations Prices Effective January 1, 2020 ZERO TURN-4 SERIES REVISED MAY 18, 2020. Machine Prices and Secications Prices Eectie anuar , 2 ZT1. Prices F.O.B. Selma, Alabama and Subject to Change Without Notice. ESTATE SERIES

1y ago

119 Views

Vanguard U.S. Stock Index Small-Capitalization Funds .

† Stock market risk, which is the chance that stock prices overall will decline. Stock markets tend to move in cycles, with periods of rising prices and periods of falling prices.The Fund’s target index tracks a subset of the U.S. stock market, which could cause the Fund to perform di

2y ago

260 Views

Forecasting Prices on the Stock Exchange Using a Trading System

Forecasting prices in stock markets is a matter of great interest both in the academic field and in business. The forecasting of stock prices and stock returns is possible using various techniques and methods. Many researchers study price trends in stock markets with the help of artificial neural networks [1-2] or fuzzy-trends [3, 4]. The

1y ago

132 Views

A Hybrid Prediction Method for Stock Price Using LSTM and . - Hindawi

the relationship between stock prices and these factors. Although these factors will temporarily change the stock price, in essence, these factors will be reﬂected in the stock price and will not change the long-term trend of the stock price. erefore, stock prices can be predicted simply with historical data.

1y ago

159 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Determine your Pricing Point Taxonomy to Help you Pricing Strategy: Using

Lowering of prices to match competitor prices can be done on a more precise level with Taxonomy. Price matching can be done based on the comparison of further classification via Taxonomy. Stock Availability: In stock/ Out of stock When both the competitor and us have the same product in stock, we ought to markdown to match prices when

1y ago

117 Views

COVID-19 and Energy COVID-19 and the Oil Price - Stock Market Nexus .

stock market. 1. Introduction Oil prices play a key role in stock market performance of oil-importing economies. A decline in oil prices reduces the cost of production and increases economic growth (Narayan et al., 2014). The effect of this is a rise in stock prices due to higher future earnings and dividends (Filis, 2010; Jones &

1y ago

125 Views

Relationship between Financial Ratios in the Stock Prices of .

projected financial ratios. Miri and Abraham (2010) linear and non-linear relationship between the ratio of stock prices in the financial and non-metallic minerals industry Tehran Stock Exchange for the years 2003 to 2007 were reviewed. The results showed that linear and nonlinear relationships between financial ratios and stock prices there is no

10m ago

99 Views

Stock prediction using a Hidden Markov Model versus a Long Short-Term .

the close price of a stock is used when training and predicting stock prices. All data was retrieved from Yahoo. Historical stock prices from 1 January 1990 until 1 June 2019 results in approximately 7000 data points (trading days) per stock. Data preparation In order to create enough data for the two models to

1y ago

121 Views

Learning from Peers Stock Prices and Corporate Investment - USI

stock prices then an increase in the -rm s own stock price informativeness reduces the sensitivity of its investment to its peer stock price (prediction 1). Indeed, as the signal conveyed by its own . stock price (prediction 2), but not otherwise. The same prediction holds for an increase in the correlation of the fundamentals of a -rm .

1y ago

128 Views

Do stock prices respond to changes in corporate income tax rates?

In the event studies, I regress stock returns on market returns and other factors over a time span well before the events of a tax change, creating a model of how the stock returns behave. Then I use the deviation of stock prices from the model's prediction around the events of the tax change to establish the stock's abnormal returns.

1y ago

117 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

Speech Enhancement Using A Minimum Mean-square Error Short-time .

It looks like you're using an ad-blocker