Robust Noise Power Spectral Density Estimation For .

1y ago
10 Views
3 Downloads
4.25 MB
16 Pages
Last View : 19d ago
Last Download : 3m ago
Upload by : Rosemary Rios
Transcription

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25DOI 10.1186/s13636-017-0122-4RESEARCHOpen AccessRobust noise power spectral densityestimation for binaural speechenhancement in time-varying diffusenoise fieldYouna Ji, Yonghyun Baek and Young-cheol Park*AbstractIn speech enhancement, noise power spectral density (PSD) estimation plays a key role in determining appropriatede-nosing gains. In this paper, we propose a robust noise PSD estimator for binaural speech enhancement in timevarying noise environments. First, it is shown that the noise PSD can be numerically obtained using an eigenvalueof the input covariance matrix. A simplified estimator is then derived through an approximation process, so that thenoise PSD is expressed as a combination of the second eigenvalue of the input covariance matrix, the noisecoherence, and the interaural phase difference (IPD) of the input signal. Later, to enhance the accuracy of the noisePSD estimate in time-varying noise environments, an eigenvalue compensation scheme is presented, in which twoeigenvalues obtained in noise-dominant regions are combined using a weighting parameter based on the speechpresence probability (SPP). Compared with the previous prediction filter-based approach, the proposed methodrequires neither causality delays nor explicit estimation of the prediction errors. Finally, the proposed noise PSDestimator is applied to a binaural speech enhancement system, and its performance is evaluated through computersimulations. The simulation results show that the proposed noise PSD estimator yields accurate noise PSDregardless of the direction of the target speech signal. Therefore, slightly better performance in quality andintelligibility can be obtained than that with conventional algorithms.Keywords: Binaural speech enhancement, Noise PSD estimation, Diffuse noise field1 IntroductionThe purpose of speech enhancement is to improve thequality and intelligibility of speech signals by suppressingdaily environmental noise while allowing a minimal levelof speech distortion. The Wiener filter and statisticmodel-based estimators [1] are well-known examples ofthe speech enhancement algorithm. Since the de-noisinggains of the speech enhancement algorithm arefundamentally determined by the noise power spectraldensity (PSD), it is important to obtain an accurate noisePSD estimate. Therefore, extensive research has beenconducted on noise PSD estimations using a singlemicrophone system [2–5]; however, they often exhibit* Correspondence: young00@yonsei.ac.krComputer and Telecommunication Engineering Division, Yonsei University,Wonju, Korealimited performances in situations with non-stationarynoise or a low signal-to-noise (SNR) ratio [6].To overcome the limitations of single-channel systems,various multi-channel techniques have been developed,including the minimum variance distortionless response(MVDR) [7] and the multi-channel Wiener filter (MWF)with constraints [8–12]. The MVDR is a widely usedspatial filter in multi-channel systems that minimizesoutput power under the constraint that the desiredsignal is not affected [7]. On the other hand, the MWFprovides an optimal solution for broadband noisereduction from a minimum mean square error (MMSE)perspective. Speech-distortion-weighted MWF (SDWMWF) has been introduced to control speech distortionand noise reduction [8]. Algorithms such as SDW-MWFand MVDR preserve speech binaural cues, but distortnoise binaural cues [10]. Therefore, extensions for The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made.

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25preserving the binaural cues of directional sources usingadditional cost functions or linear constraints have beenproposed [10, 11]. As a result, another extension to preserve interaural coherence (IC) has been proposed [12]as part of a study of spatially isotropic noise, the spatialcharacteristic of which is represented by IC.Although MWF-based extension algorithms canachieve significant noise reduction, there is always atrade-off between noise reduction and cue preservation regarding directional sources and backgroundnoise. One way to overcome the problem of binauralcue preservation is to apply a real-valued equal gainto both sides, rather than applying a complex-valuedfilter. This method diminishes noise reductionperformance by acting as a single-channel noisereduction method, but preserves all binaural cues[13]. MWF performance critically depends on thestatistical estimates of desired and undesired signalcomponents. The Voice Activity Detector (VAD) is ageneral method for estimating noise or speechstatistics, where the noise statistic can be updatedduring a noise-only time-frequency (TF) bin index.However, this method has the drawback that whenthe noise is time-varying and non-stationary, more sophisticated techniques are required to estimate signalstatistics.Many studies on binaural or multi-channel speechenhancement [14–18] based on real-valued gain function have shown that superior speech quality can beobtained by utilizing spatial information for both target speech and noise. Coherence-based binaural noisereduction was proposed in [14] and proven effectivein terms of tracking the PSD of the diffuse noise.However, the effectiveness was validated using onlythe target speech source located in front of thelistener. Other studies [15, 17] have proposed aprediction filter-based binaural noise PSD estimatorwhere the diffuse noise PSD was obtained by solvinga second-order equation formulated using a channelprediction model. Theoretically, this method shouldenable the device to obtain a true noise PSD whenthe target is situated at any location within a givendistance of the listener. However, this approachrequires a delay between channel signals to ensurethe causality condition for the prediction filter, andthe prediction error needs to be explicitly calculated.These factors directly affect the PSD estimatorperformance [16, 19].Recently, neural network-based speech enhancementalgorithms have been investigated [20, 21]. Thesealgorithms are typically divided into two processes. Inthe learning process, features are extracted from alarge training data set to learn the model and applyspeech enhancement gains based on that model inPage 2 of 16the speech enhancement part. Although extensive research has been conducted on speech enhancementusing neural networks, it is difficult to apply portableapplications because of its high complexity.In this paper, a new noise PSD estimator for a binauralspeech enhancement system that can be operated in afast time-varying diffuse noise field is presented. First, itis established that noise PSD can be estimated from theeigenvalues of the input covariance matrix without dependence on the target speech direction. Then, amethod of approximating the obtained noise PSD is presented. The result is that the smaller eigenvalue is combined with the noise correlation function and thebinaural phase difference.The auto- and cross-PSDs of the input binaural signal are often estimated using a first-order recursiveaveraging filter [22]. In a rapidly changing noise environment, averaging with a short time constant is required to quickly reflect the signal statistics of thesignal PSDs. However, the use of short time constantsleads to bias in PSD estimates, which in turndegrades the overall performance of the speechenhancement system. In this paper, a method ofcompensating for the bias is proposed that uses thestatistical characteristic of eigenvalues with a minorincrease of the computational cost. The proposedalgorithm can be adopted widely in speech-relatedapplications, such as hearing aids and mobile phones.The remainder of this paper is organized as follows.Section 2 presents a description of the general twochannel speech enhancement algorithm. A new noisePSD estimator based on the eigenvalue of the inputcovariance matrix is presented in Section 3. InSection 4, a compensation method to improve theperformance of the noise estimator in a practicalenvironment is discussed. Section 5 presents thesimulation results, in which the performance of theproposed algorithm is compared with the resultsachieved using the conventional techniques. Finally,Section 6 concludes this paper.2 Configuration of the speech enhancementalgorithm for binaural systemsIn this section, we begin with a mathematical modeling of noisy input signals in noisy environments.Following that, the configuration of a binaural speechenhancement system that can be applied to the proposed noise PSD estimator is briefly described.2.1 Input signal modelThe binaural noisy input signals, x i (t), corrupted byadditive noise in the temporal domain can be written as

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25xi ðt Þ ¼ sðt Þ hi ðt Þ þ ni ðt Þ; i ¼ L; R;ð1Þwhere s(t) is the speech signal and ni(t), i L, R are theenvironmental noises received by the left and rightchannel microphones, respectively, at time index t. hi(t)represents the acoustic impulse response from thespeech source to the i-th channel microphone and denotes the convolution operation. After applying theshort-time Fourier transforms (STFTs), (1) can be rewritten in the frequency domain asX i ðk; lÞ ¼ S ðk; lÞH i ðk; lÞ þ N i ðk; lÞ; i ¼ L; R;ð2Þwhere k and l are the frequency and frame indices, respectively. In this paper, the noise, Ni(k, l), is assumed asa diffuse noise which is a non-directional signal withequal power and random phase [23, 24]. Under the assumption that the speech and noises are uncorrelated,the auto- and cross-PSD of the noisy input signals areobtained asΦiiX ðk; lÞ ¼ jH i ðk; lÞj2 ΦS ðk; lÞ þ ΦN ðk; lÞ;ð3ÞΦX ðk; lÞ ¼ H i ðk; lÞH j ðk; lÞΦS ðk; lÞijijþ ΦN ðk; lÞ; i; j¼ L or R;ð4Þwhere denotes the complex conjugate, ΦS(k, l) andΦN(k, l), respectively, are the speech and noise auto-PSDs,2i.e., ΦS(k, l) E[ S(k, l) 2] and ΦhN(k, l) E[ NL(k,il) ] ij2 E[ NR(k, l) ]. Lastly, ΦN ðk; lÞ ¼ E N i ðk; lÞN j ðk; lÞ is thecross-PSD between the left and right channel noises.In practice, the PSDs of the noisy input signals areobtained using a first-order recursive averaging filter[22, 25, 26], ij ðk; lÞ ¼ αΦ ij ðk; l 1Þ þ ð1 αÞX i ðk; lÞX ðk; lÞ;ΦjXXð5Þwhere α [0, 1] is the smoothing factor that controls thetrade-off relationship between the fast capturing of thetime-varying statistics of the signals and the lowvariance estimation of the spectrum.Fig. 1 Block diagram of the binaural speech enhancement systemPage 3 of 162.2 Binaural speech enhancement systemFigure 1 presents a block diagram of the general binauralspeech enhancement system consisting of two microphones at the left and right ear positions of the listener.First, the noisy input signals are picked up by the leftand right channel microphones and are transformedinto frequency-domain signals via STFT. Afterestimating the noise, the de-noising gain, Gi(k, l), isdetermined based on the estimated noise and inputPSDs. The enhanced speech signal, S i ðk; lÞ , is thenobtained asS i ðk; lÞ ¼ Gi ðk; lÞ X i ðk; lÞ; i ¼ L or R:ð6ÞVarious investigations have been performed on thenoise reduction gain in single-channel [1, 27] and multichannel speech enhancement systems [7–12, 28]. Forbinaural applications, a system that is capable of generating binaural outputs and preserving binaural cues suchas the interaural level difference (ILD) and interauraltime difference (ITD) is preferred [29]. These binauralcues are crucial for spatial awareness and also importantfor speech intelligibility [30, 31]. To obtain the enhancedbinaural output with interaural cue preservation, a realvalued equal gain is often applied to both left and rightchannels. For example, if the left and right channel spectra gains are computed using the Wiener filter approach,the equal gain is determined as �ffiGðk; lÞ ¼ GL ðk; lÞ GR ðk; lÞ;ð7ÞGi ðk; lÞ ¼ξ i ðk; lÞ;1 þ ξ i ðk; lÞð8Þwhere ξ i ðk; lÞ ¼ ΦiS ðk; lÞ ΦN ðk; lÞ is an a priori SNRthat can be estimated using the decision-directedmethod [1]. Instead of (7), more sophisticated multichannel techniques such as a multi-channel Wienerfilter with various constraints [8–12] and generalizedsidelobe canceller (GSC)-based method [33, 34] can

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25be used. Although such techniques have demonstratedgreat potential in reducing both stationary and nonstationary noises, combining spectral and spatialfiltering, there is always a trade-off between noise reduction performance and interaural cue preservationfor interfering sources and the background noise [13].Therefore, in this paper, real-valued gain (7) is appliedto preserve the perceptual impression of the acousticscene. In any case, the accuracy of the estimatednoise PSD has a direct impact on the performance ofthe speech enhancement system. Therefore, in thispaper, we propose a robust noise PSD estimation algorithm for the binaural speech signal.3 The proposed noise PSD estimatorIn this section, we introduce the proposed noise PSDestimator based on eigenvalue of input covariancematrix. After that, approximation of the proposedestimator based on interaural binaural cues is presented.Under the assumption that the noises are uncorrelated,the cross-correlation between the left and right channelnoises becomes zero for most frequencies; however, diffuse noises in practical environments have significantcorrelation, especially at low frequencies [35]. Severalcoherence models for diffuse noise field have been proposed [36–38]. It is well-known that spatial coherencebetween two omnidirectional microphones in a spherically isotropic field can be modeled as real-valued analyticsinc function. In subsequent studies, several coherencemodels for binaural noise field considering the shadowing effect of the head have been proposed [22, 37, 38]. Inthis paper, we use the sinc function ΓN sinc(2πfdLR/c),where dLR and c are the distance between the left andright microphones and the speed of sound, respectively, tomodel the coherence in the diffuse noise field. This waschosen because it is a simple and effective method andapplied for many binaural speech enhancementtechniques [15, 18, 39]. In addition, the head shadowingeffect can be approximated simply by adjusting the distance between the microphones [17]. Using the coherencemodel, the cross-correlation between the left and rightchannel diffuse noise of a binaural system can beexpressed as ΦLRN ¼ ΓN ΦN [17]. Then, the 2 2 covariance matrix of the binaural input signal in (2) becomesR¼"¼ΦLLXΦRLXΦLRX#ΦRRXj H L j ΦS þ ΦNH L H R ΦS þ ΓN ΦNH R H L ΦS þ ΓN ΦNjH R j2 ΦS þ ΦN2the sake of simplicity. Furthermore, the eigenvalues ofthe covariance matrix in (9) can be computed by solvinga characteristic equation: RRLL RR LR 2¼ 0:ð10Þλ2 ΦLLX þ ΦX λ þ ΦX ΦX Φ XThe above characteristic equation can be rewrittenusing the signal and noise PSDs as λ2 λ jH L j2 þ jH R j2 ΦS þ 2ΦN þ ΦN ΦS jH L j2 þ jH R j2 2ΓN ℜ H L H R þ Φ2N 1 Γ2N¼ 0;ð11Þwhere Rf g denotes real part. Using the fact that autoand cross-PSD of target speech can be expressed by ΦSijH i H j ¼ ΦX ΦN , (11) can be rearranged for the noisePSD ΦN:N2Φ RR1 Γ 2N þ ΦN ΦLLþ 2Γ N ℜ ΦLRX þ ΦXX RR¼ 0: λ2 λ ΦLLX þ ΦX3.1 Noise PSD estimation based on eigenvalues"Page 4 of 16#;ð9Þwhere we omitted the frequency and frame indices forð12ÞNow, by solving (12), the noise PSD is obtained as 1RR f ΦLLΦN ¼ X þ ΦX2 1 Γ2N pffiffiffiffiffit 2ΓN ℜ ΦLRX Δ g;Δt 2RRþ 2ΓN ℜ ΦLR¼ ΦLLX þ ΦXX RR:þ4 1 Γ 2N λ2 λ ΦLLX þ ΦXð13ÞIt should be noted that both the first and second eigenvalues of the input covariance matrix satisfy theabove equation.The estimator in (13) can be compared with theprevious channel prediction-based noise PSD estimator in [17], where the noise PSD was obtained bysolving a quadratic equation formed using the signalsof the channel prediction filter. By substituting (3)and (4) into (13), it is straightforward to show thatthe estimator in (13) and the one in [17] are equivalent. The details are provided in the Appendix. Thus,the two estimators are expected to achieve numerically identical noise PSD under an ideal condition. Onthe other hand, another noise PSD estimator usingthe prediction filter was proposed in [15]. Thatmethod in [15] estimates the binaural noise PSDusing the target-blocking signal based on the interaural transfer function (ITF) information obtainedthrough the two-channel prediction filter.However, there are two major differences when theimplementation is considered. First, the algorithm in[17] requires an appropriate delay between channel

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25signals to satisfy the causality of the system. It wasshown in [40] that inappropriate delays coulddegrade the performance of the algorithm. Second,the prediction error and ITF need to be calculatedexplicitly. Therefore, inaccuracies occurring in theprocess of calculating the prediction error can leadto a bias of the estimated noise PSD. To reduce thisbias, [13] proposed a method of calculating thosevariables using a time-domain adaptive predictionerror filter (PEF). However, the performance of theadaptive PEF depends on the filter order, the inputSNR, and the delay between the input signals. Onthe other hand, the proposed algorithm obtains thenoise PSD estimate directly from the auto- andcross-PSD of the binaural input signal. Therefore, itcan be less sensitive to the bias error of the estimated variables, compared with the method in [13].In the next section, we first present a method ofsimplifying the estimator in (13), and later, a methodof reducing the bias error will be addressed.3.2 Approximation of the eigenvalue-based noise PSDestimatorFrom the characteristic equation in (11), the two eigenvalues of the covariance matrix are calculated asλ1;2pffiffiffiffi jH L j2 þ jH R j2 ΦS þ 2ΦN Δ;¼2ð14ÞPage 5 of 16 Creating a new term, 4 jH L j2 þ jH R j2 ΦS ΦN ΓN cos LR Φ, and using the fact that 4Γ2N Φ2N ¼ 4Γ2N Φ2N S2 LR cos ΦS þ sin2 ΦLR, we can rewrite (15) asS 2jH L j2 þ jH R j2 ΦS þ 2Γ N ΦN cos ΦLR 4ðA BÞ;S 2LRA ¼ ðjH L j ; jH R jÞ ΦS ΦN Γ N cos ΦS ; 2 2B ¼ sin2 ΦLRS ΦN ΓN ;Δ¼ ð16Þwhere x denotes the angle in radians of the function x.Now, Δ is composed of three terms including a perfectsquare. Because the low-frequency ILDs are known tobe insignificant [41], it can be generally assumed that HL HR at low frequencies. At high frequencies, onthe other hand, the noise coherence ΓN becomes insignificant. Thus, it is possible to ignore the term A in (16). LR2The third term B consists of two functions; sin ΦS2LR2and Γ N . The sin ΦS function will have small valuesat low frequencies, regardless of the location of thespeech source, due to the relatively long wavelengthcompared with the microphone distance. However, athigh frequencies, it monotonically increases according tothe angle of the speech source until the relative phasedifference reaches 90 . However, because the noise cothe multiherence ΓN will be small at high frequencies, plicative combination of sin2 ΦLRwillbe stillandΓNSinsignificant, compared with the perfect square term.Based on these observations, we approximate (16) as 2:Δ jH L j2 þ jH R j2 ΦS þ 2ΓN ΦN cos ΦLRSwhere ð17Þ2 2Δ ¼ jH L j2 þ jH R j Φ2S þ 8jH L j jH R jΦS ΦN Γ N cos ΦLRþ 4Γ 2N Φ2NSð15ÞIn our previous study [16], Eq. (15) was approximated as Δ (( HL 2 HR 2)ΦS 2ΦNΓN)2 based onassumptions that ILDs and ITDs are negligible. As aresult, the second eigenvalue was simplified to λ2 ΦN(1 ΓN), from which the noise PSD was obtainedas Φ N λ2 ð1 ΓN Þ . However, ITD at low frequenciesnormally shows a dependency on the direction of thesound source [29], and therefore affects the directional perception of the sound source. In addition,the noise coherence is particularly high at low frequencies; this can amplify the bias caused by an erroneous approximation at low frequencies. Thus,ignoring ITD causes significant errors in the noisePSD estimates, especially when the speech is locatedanywhere but in front of the listener. In this paper,we present a simple but accurate approximation of(15), which is effective for not only all target directions but also all frequency bands.By substituting (17) to (14), the second eigenvalue canbe expressed as ð18Þλ2 ΦN ΓN cos ΦLRS ΦN :In practice, the IPD of the target speech, ΦLRS , is notavailable. Thus, in this paper, we use the IPD estimateobtained from the noisy input instead of ΦLRS . Severalstudies have been conducted on the cross phase of inputand clear speech in noisy environments. Although ΦLRS LR and ΦLRvalue used in Eq.X are different, the cos ΦS(18) is combined with the noise coherence, approacheszero at high frequencies, and has a meaningful valueonly at low frequencies. Experimental results show thatLRhas a negligible effect onusing ΦLRX instead of ΦSthe final result. Finally, we estimate the noise PSD using(18) asΦN ¼λ2 ;1 ΓN cos ΦLRXð19Þwhere ΦLRX denotes the IPD estimate obtained from the

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25noisy input signal. The practical noise coherence showsas lower than one due to the head influence [17, 38, 42].Thus, by setting the upper bound of the noise coherenceas less than one, the divide-by-zero problem can beavoided. Unlike the complicated noise PSD equation in(13), the above equation can estimate the noise PSDwith only the second eigenvalue and IPD obtained fromthe noisy input signals. Thus, the accuracy of the noisePSD estimate in (19) is affected by the accuracy of thesecond eigenvalue and IPD of the target speech. Thesecond eigenvalue in the numerator represents thepower of the uncorrelated components contained in thetwo microphone signals, and thus, it is independent ofthe presence and direction of the target speech. Sincethe IPD in the denominator is combined with the noisecoherence, the direction of the target speech is considered only at low frequencies below 500 Hz. The errorcaused by the approximation will be measured in computer simulations.4 Compensation for underestimation of noise PSDWhen the auto- and cross-PSDs of the input signal areestimated using the first-order recursion algorithm in(5), the smoothing factor, α, has to cope with twocontradictory constraints: capturing the time-varyingstatistics of the signal component and reducing the estimator variance [22, 26, 43]. When the noise statistics arefast time varying, capturing of the instantaneous statistics of the signals is necessary. To this end, a short-termaveraging needs to be conducted. However, the shortterm averaging can result in bias error of the estimatedPSD [16, 25]. In this section, we propose a method ofcompensating the bias using the speech presenceprobability.4.1 Bias compensation for eigenvalueIn the absence of speech, the two eigenvalues of the input covariance matrix in (9) are expected to be identical.However, the fluctuation of auto- and cross-PSD estimates causes the first eigenvalue to be larger and thesecond eigenvalue to be smaller than the actual values,while it is still satisfied that the sum of diagonal elements of the covariance matrix, i.e., the sum of left andright channel noise PSDs, is identical to the sum of eigenvalues. Thus, in the absence of speech, it is possibleto obtain a more accurate eigenvalue by averaging thetwo eigenvalues as given by λc ¼ βn λ2 þ 1 βn λ1 ;Page 6 of 16averaging in (20) can be applied only during the speechabsence period.To this end, we propose a soft-decision approach similar as in [44] in which the weighting parameter is determined based on the SPP: 00βn ¼ βn þ 1 βn p;ð21Þ0where βn is a minimum bound of the weighting parameter and p is an estimate of SPP. When a frequency bandis with high SPP (p 1), βn 1, and λc λ2. Thus, duringthe presence of speech, only the second eigenvalue isreflected in the noise PSD estimate. When the frequency0band is with low SPP (p 0), βn becomes βn , and thetwo eigenvalue are combined with the minimum bound,0βn . Accordingly, the bias compensation for eigenvaluein (20) is mainly applied only to frequency bandswith low SPP, i.e., noise-dominant frequency bands.Using (17), the maximum eigenvalue can be approximated as λ1 ¼ jH L j2 þ jH R j2 ΦS þ ΦN 1 þ Γ N cos ΦLR.SThus, the averagedeigenvalueusing(20)canbeexpressed LR LR as λc 2¼ ΦN 12þ Γ N cos ΦS 2βΓ N cos ΦS þð1 βÞ jH L j þ jH R j ΦS , which results in a new noisePSD estimator: λc 1 βn jH L j2 þ jH R j2 ΦS LR :ΦN ¼ð22Þ1 þ ΓN cos ΦLRS 2βn ΓN cos ΦSIn a speech dominant region, i.e., βn 1, the secondterm in the numerator goes to zero. On the other hand,in a speech absence region, i.e., βn 0, we have ΦS 0.Therefore, the second term in the numerator can be ignored. Based on these observations, the new noise PSDestimator based on the averaged eigenvalue can be reexpressed as N ¼Φ1 þ ΓN cos ΦLRXλ c : 2βn ΓN cos ΦLRXð23Þ0The minimum bound of the weighting parameter, βn ,is experimentally determined as the one providing thelowest logarithmic error (LogErr) between the true andestimated noise PSD. A more detailed procedure can befound in the experimental evaluation. Also, the bands orregions with low SPPs still need to be identified, so inthe next subsection, we propose a method of estimatingSPP using eigenvalue ratios.4.2 Estimation of the speech presence probabilityð20Þwhere βn is a weighting parameter. On the other hand,during the presence of speech, only the second eigenvalue reflects the noise power. Thus, the eigenvalueThe eigenvalue compensation method introduced in theprevious subsection requires an SPP estimator in orderto obtain p. Energy ratio-based approaches [27, 44–47]have been widely used to determine the speech activityregion. Under the assumption that the left and rightchannel diffuse noise are uncorrelated, (14) is reduced to

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:25 S þ ΦN and λ2 ΦN.λ1 ¼ jH L j2 þ jH R j2 ΦS þ ΦN ¼ ΦThen, a priori SNR can be calculated as ξ ¼ Φ S ΦN¼ λ1 λ2 1, which indicates that the eigenvalue ratio λ1/λ2 can be used as an alternative to the energy ratio.Thus, in this paper, the energy ratio-based SPP in [3] ismodified using the eigenvalue ratio.First, using the eigenvalue ratio, a local likelihood ofspeech is calculated as01PL ðk; lÞ ¼if 10 log10 ρL ðk; lÞ T L;otherwiseð24ÞwherePk 0 ¼kþk 1 k ¼k k 1The eigenvalues of adjacent 2k1 bands are averagedprior to the likelihood calculation to reduce randomfluctuation. The threshold, TL, can be empirically determined using a method similar to that in [3]. In order toimprove the robustness of performance, an additionalframe likelihood of speech is measured as01if 10 log10 ρF ðlÞ T F ðlÞ;otherwiseð26Þwhere 1Xρ ðk; lÞ:ρF ðlÞ ¼ βSPP ρF ðl 1Þ þ 1 βSPPk LNSimilar to the methods in [48, 49], the threshold, TF(l),is updated using a convex combination:T F ðlÞ ¼ βcom minðB SþN ðlÞÞþ 1 βcom maxðBN ðlÞÞ;ð28Þwhere 0 βcom 1 is a weighting factor and BS N(l) andBN(l) denote buffers corresponding to noisy and noiseonly cases, respectively, in which the log ratios of L consecutive frames, 10log10ρF(m), l L 1 m l, are stored.Fig. 2 Block diagram of the proposed noise PSD estimatorNow, the threshold, TF(l), is adaptively adjusted accordingto the convex combination between the minimum of theelements of BS N(l) and the maximum of the elements ofBN(l). Finally, the SPP is estimated aspðk; lÞ ¼ αSPP pðk; l 1Þ þ ð1 αSPP Þp 0 ðk; lÞ;ð29Þwhere p' (k, l) PL(k, l) · PF(l) and 0 αSPP 1 is a smoothing parameter. It is important to mention that the proposed SPP estimator in (29) re-uses the eigenvaluescomputed using (10).4.3 The proposed noise PSD estimator with SPP-basedeigenvalue compensation0λ1 k ; l ð2k 1 þ 1Þ0k ¼k k 1 1:ρL ðk; lÞ ¼ P 0k ¼kþk 1 0 λ2 k ; l ð2k 1 þ 1Þ0P F ðl Þ ¼Page 7 of 16A block diagram of the proposed noise PSD estimatoris depicted in Fig. 2. First, the auto- and cross-PSDare estimated using a first-order recursive averagingfilter, as in (5). Two eigenvalues are computed usingthe estimated PSDs as in (10), and the energycompensation in (20) is selectively applied to thenoise-dominant regions. Finally, the PSD of the noiseis obtained using (23). A new binaural speechenhancement system can be developed by replacingthe noise PSD estimation block in Fig. 1 with theproposed noise PSD estimator in Fig. 2.5 Computer simulationsIn this section, the performance of the proposed noisePSD estimator is evaluated through computer simulations in a binaural speech enhancement situation andcompared with those of the previous methods. Allspeech sentences used in the computer simulationswere taken from the TIMIT database [50] andconvolved with binaural room impulse responses(BRIRs) from the Oldenburg database [51] to simulatetarget directions. Binaural noises taken from the ETSIdatabase [52] and Oldenburg database were added tothe target speech at various SNRs. The left and rightchannel input signals were decomposed into 32 mssubframes with 50% overlap at a sampling rate of

Ji et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:2516 kHz. The length of the subframe was determinedto satisfy the rank-1 property [53].5.1 Bias analysis of the approximated noise PSD estimatorFirst, we measured the total error caused by

Therefore, slightly better performance in quality and intelligibility can be obtained than that with conventional algorithms. Keywords: Binaural speech enhancement, Noise PSD estimation, Diffuse noise field 1 Introduction The purpose of speech enhancement is to improve the quality and intelligibility of speech signals by suppressing

Related Documents:

Figure 1: Power spectral density of white noise overlaid by flicker noise. Figure 2: Flicker noise generated from white noise. 1.1 The nature of flicker noise Looking at processes generating flicker noise in the time domain instead of the frequency domain gives us much more insight into the nature of flicker noise.

Power Spectral Subtraction which itself creates a bi-product named as synthetic noise[1]. A significant improvement to spectral subtraction with over subtraction noise given by Berouti [2] is Non -Linear Spectral subtraction. Ephraim and Malah proposed spectral subtraction with MMSE using a gain function based on priori and posteriori SNRs [3 .

2.1 Spectral Subtraction for Noise Re-duction The spectral subtraction speech enhancement is utilized broadly because it is simple and easy for the realtime processing [23]. The main idea of spectral subtraction is the independence of noise and speech signal, it will be Noisy speech power spectrum minus the noise power

Noise Figure Overview of Noise Measurement Methods 4 White Paper Noise Measurements The noise contribution from circuit elements is usually defined in terms of noise figure, noise factor or noise temperature. These are terms that quantify the amount of noise that a circuit element adds to a signal.

including spectral subtraction [2-5] Wiener filtering [6-8] and signal subspace techniques [9-10], (ii) Spectral restoration algorithms including . Spectral restoration based speech enhancement algorithms are used to enhance quality of noise masked speech for robust speaker identification. In presence of background noise, the performance of .

In this paper, we propose a spectral measure for network robustness: the second spectral moment m 2 of the network. Our results show that a smaller second spectral moment m 2 indicates a more robust network. We demonstrate both theoretically and with extensive empirical studies that the second spectral moment can help (1) capture various .

speech enhancement techniques, DFT-based transforms domain techniques have been widely spread in the form of spectral subtraction [1]. Even though the algorithm has very . spectral subtraction using scaling factor and spectral floor tries to reduce the spectral excursions for improving speech quality. This proposed

The Noise Element of a General Plan is a tool for including noise control in the planning process in order to maintain compatible land use with environmental noise levels. This Noise Element identifies noise sensitive land uses and noise sources, and defines areas of noise impact for the purpose of