Digital Speech Processing— Lecture 9 Short-Time Fourier .

2y ago
10 Views
2 Downloads
7.13 MB
103 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Brenna Zink
Transcription

Digital Speech Processing—Lecture 9Short-Time FourierAnalysis MethodsIntroduction1

General Discrete-Time Model ofSpeech ProductionVoiced Speech: AVP(z)G(z)V(z)R(z)Unvoiced Speech: ANN(z)V(z)R(z)2

Short-Time Fourier Analysis represent signal by sum of sinusoids orcomplex exponentials as it leads to convenientsolutions to problems (formant estimation, pitchperiod estimation, analysis-by-synthesismethods), and insight into the signal itself such Fourier representations provide– convenient means to determine response to a sum ofsinusoids for linear systems– clear evidence of signal properties that are obscuredin the original signal3

Why STFT for Speech Signals steady state sounds, like vowels, are producedby periodic excitation of a linear system speech spectrum is the product of the excitationspectrum and the vocal tract frequency response speech is a time-varying signal need moresophisticated analysis to reflect time varyingproperties– changes occur at syllabic rates ( 10 times/sec)– over fixed time intervals of 10-30 msec, properties ofmost speech signals are relatively constant (when isthis not the case)4

Overview of Lecture define time-varying Fourier transform (STFT)analysis method define synthesis method from time-varying FT(filter-bank summation, overlap addition) show how time-varying FT can be viewed interms of a bank of filters model computation methods based on using FFT application to vocoders, spectrum displays,format estimation, pitch period estimation5

Frequency Domain Processing Coding:– transform, subband, homomorphic, channel vocoders Restoration/Enhancement/Modification:– noise and reverberation removal, helium restoration,time-scale modifications (speed-up and slow-down ofspeech)6

Frequency and the DTFT sinusoidsx (n ) cos(ω0 n ) (e jω0n e jω0n ) / 2where ω0 is the frequency (in radians) of the sinusoid the Discrete-Time Fourier Transform (DTFT )jωX (e ) n 1x(n ) 2ππ πx (n ) e jωn DTFT { x (n )}X (e jω ) e jωn dω DTFT-1{}X (e j ω ) where ω is the frequency variable of X (e jω )7

DTFT and DFT of SpeechThe DTFT and the DFT for the infinite durationsignal could be calculated (the DTFT) andapproximated (the DFT) by the following:jωX (e ) x (m )e jωm (DTFT )m X (k ) L 1 j (2π / L ) km()(), k 0,1,., L 1xmwme m 0 X (e j ω )ω (2π k / L )(DFT )using a value of L 25000 we get thefollowing plot8

Log Magnitude (dB)Magnitude25000-Point DFT of Speech9

Short-Time FourierTransform (STFT)10

Short-Time Fourier Transform speech is not a stationary signal, i.e., ithas properties that change with time thus a single representation based on allthe samples of a speech utterance, for themost part, has no meaning instead, we define a time-dependentFourier transform (TDFT or STFT) ofspeech that changes periodically as thespeech properties change over time11

Definition of STFTj ωˆX nˆ (e ) x(m )w (nˆ m ) e j ωˆ mboth nˆ and ωˆ are variablesm w (nˆ m ) is a real window which determines the portion of x( nˆ )that is used in the computation of X nˆ (e jωˆ )12

Short Time Fourier Transform STFT is a function of two variables, the time index, nˆ, whichis discrete, and the frequency variable, ωˆ, which is continuousjωˆX nˆ (e ) x(m)w (nˆ m) e jωˆ mm DTFT ( x(m) w (nˆ m)) nˆ fixed, ωˆ variable13

Short-Time Fourier Transform alternative form of STFT (based on change of variables) isj ωˆX nˆ (e ) w (m ) x (nˆ m ) e jωˆ ( nˆ m )m e j ωˆ nˆ x (nˆ m ) w (m )e jωˆ mm if we defineX% nˆ (e jωˆ ) x(nˆ m)w (m)e ωj ˆmm then X nˆ (e jωˆ ) can be expressed as (using m′ m )X nˆ (e jωˆ ) e jωˆ nˆ X% n (e jωˆ ) e jωˆ nˆ DTFT x (nˆ m )w ( m ) 14

STFT-Different Time Origins the STFT can be viewed as having two different time origins1. time origin tied to signal x (n )j ωˆX nˆ (e ) x(m)w (nˆ m) e j ωˆ mm DTFT x (m )w (nˆ m ) , nˆ fixed, ωˆ variable2. time origin tied to window signal w ( m )j ωˆX nˆ (e ) e j ωˆ nˆ x (nˆ m )w ( m ) e jωˆ mm e jωˆ nˆ X% (e jωˆ ) e jωˆ nˆ DTFT w ( m )x (nˆ m ) ,nˆ fixed, ωˆ variable15

Time Origin for STFTm nˆ x[0]Time origin tied towindow w[ m] x[nˆ m]16

Interpretations of STFT there are 2 distinct interpretations of X nˆ (e jωˆ )1. assume nˆ is fixed, then X nˆ (e jωˆ ) is simply the normal Fouriertransform of the sequence w (nˆ m ) x (m ), m forfixed nˆ, X nˆ (e jωˆ ) has the same properties as a normal Fouriertransform2. consider X nˆ (e jωˆ ) as a function of the time index nˆ with ωˆ fixed.Then X nˆ (e jωˆ ) is in the form of a convolution of the signal x (nˆ ) e jωˆ nˆwith the window w (nˆ ). This leads to an interpretation in the form oflinear filtering of the frequency modulated signal x(nˆ )e jωˆ n by w (nˆ ).ˆ- we will now consider each of these interpretations of the STFT ina lot more detail17

Fourier Transform Interpretation consider X nˆ (e jωˆ ) as the normal Fourier transform of the sequencew (nˆ m ) x (m ), m for fixed nˆ. the window w ( nˆ m ) slides along the sequence x( m ) and defines anew STFT for every value of nˆ what are the conditions for the existence of the STFT the sequence w (nˆ m ) x(m ) must be absolutely summable for allvalues of nˆ- since x (nˆ ) L (32767 for 16-bit sampling)- since w (nˆ ) 1 (normalized window levels)- since window duration is usually finite w (nˆ m ) x (m ) is absolutely summable for all nˆ18

Frequencies for STFT the STFT is periodic in ω with period 2π , i.e.,X nˆ (e jωˆ ) X nˆ (e j (ωˆ 2π k ) ), k can use any of several frequency variablesto express STFT, includingˆ T (where T is the sampling period for--ωˆ Ωx (m )) to represent analog radian frequency,giving X nˆ (eˆTjΩ)ˆ to represent normalized--ωˆ 2π fˆ or ωˆ 2π FTfrequency (0 fˆ 1) or analog frequencyˆj 2π FTj 2π fˆˆ)(0 F Fs 1 / T ), giving X nˆ (e ) or X nˆ (e19

Signal Recovery from STFT since for a given value of nˆ, X nˆ (e jωˆ ) has the same properties as anormal Fourier transform, we can recover the input sequence exactly since X nˆ (e jωˆ ) is the normal Fourier transform of the windowedsequence w (nˆ m ) x(m ), then1w (nˆ m ) x (m ) 2ππ X nˆ (e jωˆ )e jωˆ m dωˆ π assuming the window satisfies the property that w ( 0) 0 ( a trivialrequirement), then by evaluating the inverse Fourier transformwhen m nˆ, we obtainπ1j ωˆj ω nˆ()x(nˆ ) Xeedωˆnˆ 2π w (0) π20

Signal Recovery from STFTπ1j ωˆj ω nˆˆ()x(n ) Xeedωˆnˆ 2π w (0) π with the requirement that w ( 0) 0, the sequence x( nˆ )can be recovered exactly from X nˆ (e jωˆ ), if X nˆ (e jωˆ ) is knownfor all values of ωˆ over one complete period- sample-by-sample recovery process- X nˆ (e jωˆ ) must be known for every value of nˆ and for all ωˆcan also recover sequence w (nˆ m )x (m ) but can't guaranteethat x (m ) can be recovered since w (nˆ m ) can equal 021

Properties of STFTX nˆ (e jωˆ ) DTFT [w (nˆ m ) x (m )]nˆ fixed, ωˆ variable relation to short-time power density functionSnˆ (e jωˆ ) X nˆ (e jωˆ ) 2 X nˆ (e jωˆ ) X n ˆ (e jωˆ ) DTFT [Rnˆ (k )] nˆ fixedRnˆ (k ) m w (nˆ m )x (m )w (nˆ m k )x (m k ) Snˆ (e jωˆ ) Relation to regular X (e jωˆ ) (assuming it exists)X (e ) DTFT [ x (m )] j ωˆ1X nˆ (e jωˆ ) 2π x(m ) e j ωˆ mm πj ( ωˆ θ ) jθ jθ nˆW(e)X(e)edθ π w (nˆ m ) x (m ) W (e jθ )e jθ nˆ X (e jθ ) 22

Properties of STFT assume X (e jωˆ ) existsX (e ) DTFT [ x (m )] j ωˆ1X nˆ (e jωˆ ) 2ππ x (m ) e jωˆ mm j ( ωˆ θ ) jθ jθ nW(e)X(e)edθ ˆ π limiting casew (nˆ ) 1 nˆ W (e jωˆ ) 2πδ (ωˆ )1X nˆ (e ) 2πj ωˆπ 2πδ ( θ ) X (e j (ωˆ θ ) ) e jθ n dθ X (e jωˆ )ˆ πi.e., we get the same thing no matter where the window isshifted23

Alternative Forms of STFTAlternative forms of X nˆ (e jωˆ )1. real and imaginary partsX nˆ (e jωˆ ) Re X nˆ (e jωˆ ) j Im X nˆ (e jωˆ ) anˆ (ωˆ ) j bnˆ (ωˆ )anˆ (ωˆ ) Re X nˆ (e jωˆ ) bnˆ (ωˆ ) Im X nˆ (e jωˆ ) when x(m ) and w (nˆ m ) are both real (usually the case)can show that anˆ (ωˆ ) is symmetric in ωˆ , and bnˆ (ωˆ ) isanti-symmetric in ωˆ2. magnitude and phaseX nˆ (e jωˆ ) X nˆ (e jωˆ ) e jθnˆ (ωˆ ) can relate X nˆ (e jωˆ ) and θ nˆ (ωˆ ) to anˆ (ωˆ ) and bnˆ (ωˆ )24

Role of Window in STFTThe window w ( nˆ m ) does the following:1. chooses portion of x( m ) to be analyzed2. window shape determines the nature of X nˆ (e jωˆ )Since X nˆ (e jωˆ ) (for fixed nˆ ) is the normal FT of w (nˆ m ) x (m ),then if we consider the normal FT's of both x( n ) and w ( n )individually, we getj ωˆX (e ) x (m ) e jωˆ mm j ωˆW (e ) w (m ) e jωˆ mm 25

Role of Window in STFT then for fixed nˆ, the normal Fourier transform of theproduct w (nˆ m ) x(m ) is the convolution of the transformsof w (nˆ m ) and x (m ) for fixed nˆ, the FT of w (nˆ m ) is W (e jωˆ ) e jωˆ n --thusˆ1j ωˆX nˆ (e ) 2ππ jθ jθ nj ( ωˆ θ )W(e)eX(e) dθ ˆ π and replacing θ by θ gives1j ωˆX nˆ (e ) 2ππjθjθ nj ( ωˆ θ )()() dθWeeXe ˆ π26

Interpretation of Role of Window X nˆ (e jωˆ ) is the convolution of X (e jωˆ ) with the FT of the shiftedwindow sequence W (e jωˆ ) e jωˆ nˆ X (e jωˆ ) really doesn't have meaning since x( nˆ ) varies with time;consider x (nˆ ) defined for window duration and extended for all timeto have the same properties then X (e jωˆ ) does exist with propertiesthat reflect the sound within the window (can also consider x( nˆ ) 0outside the window and define X (e jωˆ ) appropriately--but this isanother case)Bottom Line: X nˆ (e jωˆ ) is a smoothed version of the FT of the partof x (nˆ ) that is within the window w .27

Windows in STFT for X nˆ (e jωˆ ) to represent the short-time spectral properties of x (nˆ )inside the window W (e jθ ) should be much narrower in frequencythan significant spectral regions of X (e jωˆ )--i.e., almost an impulsein frequency consider rectangular and Hamming windows, where width of themain spectral lobe is inversely proportional to window length, and sidelobe levels are essentially independent of window lengthRectangular Window: flat window of length L samples; firstzero in frequency response occurs at FS/L, with sidelobe levelsof -14 dB or lowerHamming Window: raised cosine window of length Lsamples; first zero in frequency response occurs at 2FS/L, withsidelobe levels of -40 dB or lower28

WindowsL 2M 1-point Hamming window and its corresponding DTFT29

Frequency Responses ofWindows30

Effect of Window Length-HW31

Effect of Window Length-HW32

Effect of Window Length-RW33

Effect of Window Length-HW34

Relation to Short-Time AutocorrelationX nˆ (e jωˆ ) is the discrete-time Fourier transform of w[nˆ m]x[m]for each value of nˆ , then it is seen thatS nˆ (e jωˆ ) X nˆ (e jωˆ ) 2 X nˆ (e jωˆ ) X n*ˆ (e jωˆ )is the Fourier transform ofRnˆ (l ) w[nˆ m]x[m]w[nˆ l m]x[m l ]m which is the short-time autocorrelation function of the previouschapter. Thus the above equations relate the short-time spectrumto the short-time autocorrelation,35

Short-Time Autocorrelation and STFT36

Summary of FT view of STFT interpret X nˆ (e jω ) as the normal Fourier transform of the sequencew (nˆ m ) x(m ), m properties of this Fourier transform depend on the windowo frequency resolution of X nˆ (e jω ) varies inversely with thelength of the window want long windows for high resolutiono want x(n ) to be relatively stationary (non-time-varying) duringduration of window for most stable spectrum want shortwindows as usual in speech processing, there needs to be a compromisebetween good temporal resolution (short windows) and goodfrequency resolution (long windows)37

Linear FilteringInterpretation of STFT38

Linear Filtering Interpretation1. modulation-lowpass filter form ( n rather than nˆ )j ωˆX n (e ) x (m ) e jωˆ mw (n m )m () w (n ) x (n ) e jωˆ n ,1 2πn variable, ωˆ fixedπjθj (θ ωˆ )jθ n()()WeXeedθ π2. bandpass filter-demodulation j ωˆX n (e ) w (m ) x (n m ) e jωˆ ( n m )m e j ωˆ n (w (m )e jωˆ m ) x (n m )m e jωˆ n [(w (n )e jωˆ n ) x (n )],n variable, ωˆ fixed39

Linear Filtering Interpretation1. modulation-lowpass filter form: j ωˆX n (e ) x (m ) e jωˆ mw (n m ),m n variable, ωˆ fixed() x (n ) e jωˆ n w (n ) ( x (n )cos(ωˆ n ) ) w (n ) j ( x (n )sin(ωˆ n ) ) w (n ) an (ωˆ ) jbn (ωˆ )40

Linear Filtering Interpretation2. bandpass filter-demodulation form()X n (e jωˆ ) e jωˆ n w (n )e jωˆ n x (n ) , n variable, ωˆ fixed complex bandpass filter outputmodulated by signal e jωˆ n if W (e jθ ) is lowpass, then filteris bandpass around θ ωˆ all real computation for lowerhalf structure41

Linear Filtering InterpretationLowpass filter frequency responseBandpass filter frequency response42

Linear Filtering Interpretation assume normal FT of x( n ) existsx (n ) X (e jθ ) (recall that ωˆ is a particular frequency)x (n )e jωˆ n X (e j (θ ωˆ ) ) spectrum of x (n ) at frequency ωˆ is shifted to zero frequency; since the STFT is a convolution, the FT of the STFT is the productof the individual FT's, i.e.,X (e j (θ ωˆ ) ) W (e jθ ) if W (e jθ ) resembles a narrow band lowpass filter, i.e., W (e jθ ) 1for small θ and is 0 otherwise, thenX (e j (θ ωˆ ) ) W (e jθ ) X (e jωˆ )43

Summary-STFTShort-Time Fourier Transform (STFT)X nˆ (e jωˆ ) x [m] w [nˆ m] e jωˆ m ,m nˆ , 0 ωˆ 2πFixed value of nˆ, varying ωˆ -- DFT InterpretationFixed value of ωˆ , varying nˆ -- Filter Bank Interpretation44

SummaryShort-Time Fourier Transform (STFT) x [m] w [nˆ m] e jωˆ m , nˆ , 0 ωˆ 2πm 2πX nˆ (e jωˆ ) 0ω̂0R n2R 3RDFT: X nˆ (e jωˆ ) nˆ m nˆ L 1 j ωˆ mˆxmwn me[][]()X nˆ (e jωˆ ) DFT ( x [m]w [nˆ m] )0 ωˆ 2π , nˆ 0, R,2R,.45

Summary – Modulation/Lowpass FilterShort-Time Fourier Transform (STFT)X nˆ (e jωˆ ) ωˆ L 1 x [m] w [nˆ m] e jωˆ m , nˆ , 0 ωˆ 2πm ω̂2ω̂1ω̂0nj ωˆFilter Bank: X n (e ) n ()x [m]e jωˆ m w [n m]m n L 1() ( x [n ]e) w [n ]X n (e jωˆ ) x [n ]e jωˆ n w [n m] j ωˆ n46

Summary – Bandpass Filter/DemodulationShort-Time Fourier Transform (STFT)X nˆ (e jωˆ ) ωˆ L 1 x [m] w [nˆ m] e jωˆ m , nˆ , 0 ωˆ 2πm ω̂2ω̂1ω̂0nj ωˆFilter Bank: X n (e ) ()x [n m]e jωˆ ( n m ) w [m]m X n (e jωˆ ) e jωˆ n (w [n ] e jωˆ n ) x [n ] 47

Summary – ModulationModulationx [n ] ej ωˆ njω X (e ) FT (ej ωˆ n) X (e ) δ (ω ωˆ )jωjωX (e )-W0 X (eWj ( ω ωˆ )ωˆ W)X (eω̂j ( ω ωˆ ))ωˆ W48

STFT Magnitude Only for many applications you only need the magnitudeof the STFT(not the phase) in such cases, the bandpass filter implementation isless complex, since1/ 2 X n (e ) a (ωˆ ) b (ωˆ ) j ωˆ2n2n1/ 222j ωˆ%% X n (e ) a%n (ωˆ ) bn (ωˆ ) 49

Sampling Rates of STFT50

Sampling Rates of STFT need to sample STFT in both time and frequency toproduce an unaliased representation from which x(n) canbe exactly recovered sampling rates lower than the theoretical minimum ratecan be used, in either time or frequency, and x(n) canstill be exactly recovered from the aliased (undersampled) short-time transform– this is useful for spectral estimation, pitch estimation, formantestimation, speech spectrograms, vocoders– for applications where the signal is modified, e.g., speechenhancement, cannot undersample STFT and still recovermodified signal exactly51

Sampling Rate in Time to determine the sampling rate in time, we take a linear filtering view1. X n (e jωˆ ) is the output of a filter with impulse response w% ( n )2. W (e jωˆ ) is a lowpass response with effective bandwidth of B Hertz thus the effective bandwidth of X n (e jωˆ ) is B Hertz X n (e jωˆ ) hasto be sampled at a rate of 2B samples/second to avoid aliasingExample: Hamming Windoww (n ) 0.54 0.46 cos( 2π n / (L 1)) 0 n L 1 0otherwise2F B s (Hz); for L 400, Fs 10, 000 Hz B 50 Hz needLrate of 100/sec (every 100 samples) for sampling rate in time52

Sampling Rate in Frequency since X n (e jωˆ ) is periodic in ωˆ with period 2π , it is only necessary to sample over aninterval of length 2π need to determine an appropriate finite set of frequencies, ωˆ k 2π k / N, k 0,1,., N 1at which X n (e jωˆ ) must be specified to exactly recover x(n ) use the Fourier transform interpretation of X n (e jωˆ )1. if the window w ( n ) is time-limited, then the inverse transform of X n (e jωˆ ) is time-limited2. the sampling theorem requres that we sample X n (e jωˆ ) in the frequency dimension at arate of at least twice its ('symmetric') "time width"3. since the inverse Fourier transform of X n (e jωˆ ) is the signal x(m ) w (n m ) and this signalis of duration L samples (the duration of w (n )), then according to the sampling theoremX n (e jωˆ ) must be sampled (in frequency) at the set of frequencies2π k, k 0,1,., L 1 (where L / 2 is the effective width of the window)Lin order to exactly recover x( n ) from X n (e jωˆk )ωˆ k thus for a Hamming window of duration L 400 samples, we require that the STFT beevaluated at at least 400 uniformly spaced frequencies around the unit circle53

“Total” Sampling Rate of STFT the “total” sampling rate for the STFT is the product of the samplingrates in time and frequency, i.e.,SR SR(time) x SR(frequency) 2B x L samples/secB frequency bandwidth of window (Hz)L time width of window (samples) for most windows of interest, B is a multiple of FS/L, i.e.,B C FS/L (Hz), C 1 for Rectangular WindowC 2 for Hamming WindowSR 2C FS samples/secondcan define an ‘oversampling rate’ ofSR/ FS 2C oversampling rate of STFT as compared toconventional sampling representation of x(n)for RW, 2C 2; for HW 2C 4 range of oversampling is 2-4this oversampling gives a very flexible representation of the speech signal54

Mathematical Basis for Sampling the STFT assume sample in time at nˆ nr rR, r 2π k, k 0,1,., N 1and in frequency at ωˆ ωˆ k N sample valuesX r R (ej2πkN w [rR m]x[m] e) 2πkmNm eX% r R (e jj2πkN) j2πkrRNX% r R (ej2πkN) x[rR m]w ( m) em j2πkmN( set m rR m′; m m′ ) define DFT-type notationX r ( k ) X r R (ej2πkN) e j2πkrRNX% r (k )55

Sampling the STFT56

Sampling the STFT DFT NotationX r [ k ] X r R (ej2πkN) e j2πkrRNX% r [k ] let w [ m] 0 for 0 m L 1 (finite duration window with no zero-valued samples)X% r [k ] L 1 x [ r R m ] w [ m ] e j2πkmNm 0(r fixed, 0 k N 1) if L N then (DFT defined with no aliasing can recover sequenceexactly using inverse DFT)2πjkm1 N 1 %x [ r R m ] w [ m ] X r [ k ] e NN k 0(r fixed, 0 m N 1) if R L (IDFT defined with no aliasing), then all samplescan be recovered from X r [k ]( R L gaps in sequence )57

What We Have Learned So Far1.j ωˆX nˆ (e ) x (m )w (nˆ m )e jωˆ mm function of nˆ n for sampled ωˆ (looks like a time sequence)function of ωˆ ω for sampled nˆ (looks like a transform)X nˆ (e jωˆ ) (no sampling rate reduction) defined for nˆ 1, 2, 3,.; 0 ωˆ π2.X nˆ (e jωˆ ) DTFT x (m )w (nˆ m ) nˆ fixed, ωˆ variablewith time origin tied to x( nˆ )X nˆ (e jωˆ ) e jωˆ nDTFT x (nˆ m )w ( m ) nˆ fixed, ωˆ variableˆwith time origin tied to w ( m )3. Interpretations of X nˆ (e jωˆ )1. nˆ fixed, ωˆ ω variable; X nˆ (e jωˆ ) DTFT x (m )w (nˆ m ) DFT View2. nˆ n variable, ωˆ fixed; X nˆ (e jωˆ ) x(n )e jωˆ n w (n ) Linear Filteringview filter bank implementation58

What We Have Learned So Far4. Signal Recovery from STFT1x (m )w (nˆ m ) 2ππ X nˆ (e jωˆ )e jωˆ m dωˆ ππ1j ωˆj ωˆ nˆ()x (nˆ ) Xeedωˆˆn 2π w (0) π5. Linear Filtering Interpretation1. modulation-lowpass filter X n (e jωˆ ) w (n ) x (n )e jωˆ n ,nˆ n variable, ωˆ fixed1X n (e ) 2πj ωˆπjθj (θ ωˆ )jθ n()()WeXeedθ π()2. bandpass filer-demodulation X n (e jωˆ ) e jωˆ n w (n )e jωˆ n x(n ) , nˆ n variable, ωˆ fixed59

What We Have Learned So Far6. Sampling Rates in Time and Frequency1. time: W (e jω ) has bandwidth of B Hertz 2B samples/sec rate2FHamming Window: B S (Hz)L2. frequency: w% (n ) is time limited to L samples inverse of X n (e jω ) isalso time limited need to sample in frequency at twice the (effective)time width of the time-limited sequence L frequency samples3. total Sampling Rate: 2B L samples/sec- B frequency bandwidth of the window (Hz)- L effective time width of the window (samples)B C FS / L (Hz) Sampling Rate 2B L 2CFS samples/second- for Rectangular Window, C 1- for Hamming Window, C 260

Spectrographic Displays61

Spectrographic Displays Sound Spectrograph-one of the earliest embodiments of the timedependent spectrum analysis techniques– 2-second utterance repeatedly modulates a variable frequencyoscillator, then bandpass filtered, and the average energy at a giventime and frequency is measured and used as a crude measure of theSTFT– thus energy is recorded by an ingenious electro-mechanical system onspecial electrostatic paper called teledeltos paper– result is a two-dimensional representation of the time-dependentspectrum-with vertical intensity being spectrum level at a givenfrequency, and horizontal intensity being spectral level at a given timewith spectrum magnitude being represented by the darkness of themarking– wide bandpass filters (300 Hz bandwidth) provide good temporalresolution and poor frequency resolution (resolve pitch pulses in timebut not in frequency)—called wideband spectrogram– narrow bandpass filters (45 Hz bandwidth) provide good frequencyresolution and poor time resolution (resolve pitch pulses in frequency,but not in time)—called narrowband spectrogram62

Conventional Spectrogram (Everysalt breeze comes from the sea)63

Digital Speech Spectrograms widebandspectrogram follows broad spectral peaks (formants)over time resolves most individual pitch periods asvertical striations since the IR of theanalyzing filter is comparable in durationto a pitch period what happens for low pitch males—highpitch females for unvoiced speech there are no verticalpitch striations narrowbandspectrogram individual harmonics are resolved invoiced regions formant frequencies are still in evidence usually can see fundamental frequency unvoiced regions show no strongstructure64

Digital Speech Spectrograms Speech Parameters (“This is a test”):– sampling rate: 16 kHz– speech duration: 1.406 seconds– speaker: male Wideband Spectrogram Parameters:––––– analysis window: Hamming windowanalysis window duration: 6 msec (96 samples)analysis window shift: 0.625 msec (10 samples)FFT size: 512dynamic range of spectral log magnitudes: 40 dBNarrowband Spectrogram Parameters:–––––analysis window: Hamming windowanalysis window duration: 60 msec (960 samples)analysis window shift: 6 msec (96 samples)FFT size: 1024dynamic range of spectral log magnitudes: 40 dB65

Digital Speech SpectrogramsTop Panel:3 msec (48samples) windowSecond Panel:6 msec (96samples) windowThird Panel:9 msec (144sample) windowFourth Panel:30 msec (480sample) window66

Spectrogram Comparisons67

Spectrogram - Malenfft 1024,L 80, Overlap 7568

Spectrogram - Femalenfft 1024,L 80, Overlap 7569

Overlap Addition (OLA)Method70

Overlap Addition (OLA) Method based on normal FT interpretation of short-time spectrumDFT / IDFT y nˆ (m ) x (m ) w (nˆ m )X nˆ (e jωk ) can reconstruct x(m) by computing IDFT of X nˆ (e jωk ) anddividing out the window (assumed non-zero for all samples) this process gives L signal values of x(m) for each window window can be moved by L samples and the process repeated since X nˆ (e jωk ) is "undersampled" in time, it is highly susceptibleto aliasing errors need more robust synthesis procedure71

Overlap Addition (OLA) Method y (n ) X m (e jωk ) e jωk n m k summation is for overlapping analysis sections for each value of m where X m (e jωk ) is measured, do an inverse FT to givey m (n ) Lx (n )w (m n ) (where L is the size of the FT)y (n ) y m (n ) Lx (n ) w (m n )mm a basic property of the window isW (e ) W (ej0j ωkN 1) ωk 0 w (n )n 0 since any set of samples of the window are equivalent (by sampling arguments),then if w (n ) is sampled often enough we get (independent of n ) w ( m n ) W (ej0)my (n ) Lx (n )W (e j 0 )using overlap-added sections72

Overlap Addition of Bartlettand Hann Windows73

Overlap Addition of Hamming WindowL 12874

Window SpectraDTFTof Bartlett (triangular), Hann and Hamming windows75

Hamming Window SpectraDTFTs of even-length, odd-length and modified odd-to-evenlength Hamming windows; zeros spaced at 2π/R give perfectreconstruction using OLA (even-length window)76

Overlap Addition (OLA) Method77

Overlap Addition (OLA) Method w(n) is an L-pointHamming window withR L/4 assume x(n) 0 for n 0 time overlap of 4:1 for HW first analysis sectionbegins at n L/478

Overlap Addition (OLA) Method 4-overlapping sectionscontribute to each interval N-point FFT’s done usingL speech samples, with N-Lzeros padded at end toallow modifications withoutsignificant aliasing effects for a given value of ny(n) x(n)w(R-n) x(n)w(2Rn) x(n)w(3R-n) x(n)w(4Rn) x(n)[w(R-n) w(2R-n) w(3Rn) w(4R-n)] x(n) W(ej0)/R79

Filter Bank Summation(FBS)80

Filter Bank Summation the filter bank interpretation of the STFT shows that forany frequency ωk , X n (e jωk ) is a lowpass representationof the signal in a band centered at ωk ( n nˆ for FBS)X n (e jωk ) e jωk n x (n m ) w k (m ) e jωk mm where w k ( m ) is the lowpass window used at frequency ωk(we have generalized the structure to allow a differentlowpass window at each frequency ωk ).81

Filter Bank Summation define a bandpass filter and substitute it in theequation to givehk (n ) w k (n ) e jωk nX n (e jωk ) e jωk n x(n m ) h (m )km 82

Filter Bank Summation83

Filter Bank Interpretation ofSTFT (case: wk[n] w[n])w [ n ] W (e j ω )-ω0ω0ωh[n ] w [n ]e jωk n H (e jω ) W (e jω ) FT (e jωk n )FT (ej ωk n) en j ωk ne jω n e j ( ω ωk ) nn δ (ω ωk )H (e jω ) W (e jω ) δ (ω ωk ) W (e j (ω ωk ) )“single-sided,bandpass”ωk-ω0v k [n ] X n (e jωk ) e jωk n [ x [n ] h[n ]]ωkωk ω0ωVk (e jω ) H (e jω ) X (e jω ) FT (e jωk n ) H (e jω ) X (e jω ) δ (ω ωk ) X (e jω ) W (e j (ω ωk ) ) δ (ω ωk ) X (e j (ω ωk ) ) W (e jω )“lowpass”84

Filter Bank Summation thus X n (e jωk ) is obtained by bandpass filtering x( n )followed by modulation with the complex exponentiale jωk n . We can express this in the formy k (n ) X n (e jωk ) e jωk n x(n m ) h (m )km thus y k (n ) is the output of a bandpass filter with impulseresponse hk (n )85

Filter Bank Summation86

Filter Bank Summation a practical method for reconstructing x (n ) from the STFT is as follows1. assume we know X n (e jωk ) for a set of N frequencies {ωk }, k 0,1,., N 12. assume we have a set of N bandpass filters with impulse responseshk (n ) w k (n ) e jωk n , k 0,1,., N 13. assume w k ( n ) is an ideal lowpass filter with cutoff frequency ωpk- the frequency response of the bandpass filter isHk (e jω ) Wk (e j (ω ωk ) )87

Filter Bank Summation consider a set of N bandpass filters, uniformly spaced, so that the entirefrequency band is covered2π k, k 0,1,., N 1N also assume window the same for all channels, i.e.,w k (n ) w (n ), k 0,1,., N 1ωk if we add together all the bandpass outputs, the composite response isH% (e jω ) N 1N 1 H ( e ω ) W (ekk 0jj (ω ωk ))k 0 if W (e jωk ) is properly sampled in frequency (N L ), where L is thewindow duration, then it can be shown that1NN 1 k 0W (e j (ω ωk ) ) w (0) ωFBS Formula88

Filter Bank Summation1/N89

Filter Bank Summation derivation of FBS formulaFT / IFTw (n ) W (e j ω ) if W (e jω ) is sampled in frequency at N uniformlyspaced points, the inverse discrete Fourier transformof the sampled version of W (e jωk ) is (recall thatsampling multiplication convolution aliasing)1NN 1 k 0W (e jωk )e jωk n w (n rN )r an aliased version of w (n ) is obtained.90

Filter Bank Summation If w (n ) is of duration L samples, thenw (n ) 0, n 0, n L and no aliasing occurs due to sampling in frequencyof W (e jω ). In this case if we evaluate the aliasedformula for n 0, we get1NN 1 W (e jωk ) w (0)k 0 the FBS formula is seen to be equivalent to the formulaabove, since (according to the sampling theorem) anyset of N uniformly spaced samples of W (e jω ) is adequate91

Filter Bank Summation the impulse response of the composite filter bank system ish%(n ) N 1 N 1hk (n ) k 0 w (n ) e jωk n N w (0) δ (n )k 0 thus the composite output isy (n ) x(n ) h%(n ) N w (0) x (n ) thus for FBS method, the reconstructed signal isN 1y (n ) k 0N 1y k (n ) X n (e jωk ) e jωk n N w (0) x (n )k 0 if X n (e jωk ) is sampled properly in frequency, and is independent of theshape of w (n )92

Filter Bank Summation1/N93

Filter Bank SummationN 1N 1k 0k 0y ( n ) y k ( n ) X n (ej2πkN)ej2πknN2π2π jkm jkn NN x(m )w (n m ) ee k 0 m N 1N 1 x (m )w (n m ) ej2πk ( n m )Nk 0m x (m )w (n m ) Nδ (n m rN )r m y (n ) N w (rN )x (n rN )r w (n ) 0 for 0 n L 1 if N L then need only r 0 termy(n) Nw( 0 )x(n)

speech is not a stationary signal, i.e., it has properties that change with time thus a single representation based on all the samples of a speech utterance, for the most part, has no meaning instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Lecture 1 Introduction to Digital Speech Processing 2 Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of social science. Speech is related to human physiological capability; physiology is a branch of medical science.

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

The complete set of MATLAB Speech Processing Apps is made available to students and instructors via MATLAB Central, File Exchange, on the MathWorks website, including: -all the code that is required to run the complete set of Speech Processing Apps -an extensive set of speech and audio files for processing

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .