Multi-Resolution Speech Spectrogram - IJCA

1y ago
8 Views
1 Downloads
783.82 KB
5 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Baylee Stein
Transcription

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011Multi-Resolution Speech SpectrogramRohini R. MerguLecturerWIT, SolapurABSTRACTAn important aid in analysis & display of speech is soundspectrogram. It represents time-frequency-intensity display ofshort time spectrum. The quality of speech can be studied byvisual inspection of spectrogram. This is one of the importantapplications of spectrogram in speech processing especially inspeech enhancement. Another application of spectrogram is inisolating voiced and unvoiced regions. But to conclude fromvisual inspection the clarity of spectrogram is also important.Before plotting the spectrogram the time domain speech signal isconverted to frequency domain. The transform domain used playsvital role in resolution of spectrogram. Generally Fast FourierTransform is used to convert the time domain signal intofrequency domain signal. This paper discusses the effect of usingdifferent transform for converting the time domain speech signalinto frequency domain before plotting spectrogram. . It isobserved that resolution of speech spectrogram is transformdependent.KeywordsSpectrogram, Speech Enhancement, Speech Processing, Speech &Noise, Speech Quality, SNR, Resolution.1. INTRODUCTIONIn many practical situations, speech has to be recorded in thepresence of undesirable background noise. As noise oftendegrades the quality/intelligibility. In many practical situations,speech has to be recorded in the presence of quality/intelligibility of recorded speech, it is beneficial to carryout noise suppression. In the literature, a variety of speechenhancement methods capable of suppressing noise has beenproposed. In speech enhancement the graphical representation ofspeech is spectrogram plays vital role to examine speech quality.The quality of speech can be observed quickly using spectrogram.This is one of the important applications of spectrogram in speechenhancement. Another application of spectrogram is in isolatingvoiced and unvoiced regions. But to conclude from visualinspection the clarity of spectrogram is also important. Beforeplotting the spectrogram the time domain speech signal isconverted to frequency domain. The transform domain used playsvital role in resolution of spectrogram. Generally Fast FourierTransform is used to convert the time domain signal intofrequency domain signal. This paper discusses the effect of usingdifferent transform for converting the speech signal into frequencydomain before plotting spectrogram.Zenton Goh, Kah-Chye Tan, and B.T.G.Tan [1]examined the spectrograms of typical clean speech, noisy speech,and enhanced speech. The horizontal axis of the spectrogramDr.Shantanu K. DixitProfessor & HeadWIT, Solapurdenotes time, vertical axis frequency, and the spectral magnitudeis shown with gray shade (darker shade indicates larger value). Itis observed that a large portion of the spectrogram is practicallyblank (i.e., unshaded) and the speech energy is concentrated in afew isolated regions. The voiced portion of speech ischaracterized by dark parallel “stripes” whereas unvoiced portionis characterized by gray patches. Some parallel stripes arehorizontal while some are slanting up or down, indicating achange in the pitch of the speech signal. When white Gaussiannoise amounting to the clean speech, the blank region of thespectrogram become shaded, and some of the stripescorresponding to voiced speech disappear. With an appropriatespectral subtraction, obtained an enhanced speech withspectrogram and observed a significant reduction of the unwantedshort stripes. By observation of spectrogram [1] concluded aboutspeech quality.S. Gannot, D. Burshtein, and Ehud Weinstein [6]presented a class of Kalman filter-based algorithms with someextensions, modifications, and improvements of previous work.The first algorithm employs the estimate-maximize (EM) methodto iteratively estimate the spectral parameters of the speech andnoise parameters. The enhanced speech signal is obtained as abyproduct of the parameter estimation algorithm. And used soundspectrogram for comparison of speech quality using Kalman-EMIterative (KEMI) algorithm and log spectral amplitude estimator(LSAE) algorithm. R.C.Hendriks, R.Heusdens, and J. Jensen [2]used a deterministic model in combination with the well-knownstochastic models for speech enhancement. Thus derived aminimum mean-square error(MMSE) estimator under a combinedstochastic–deterministic speech model with speech presenceuncertainty and show that for different distributions of the DFTcoefficients the combined stochastic–deterministic speech modelleads to improved performance and used speech spectrogram forclassification of speech component as deterministic or stochastic.Nicholas W.D. Evans, John S.Mason and Matt J. Roach [5]described the application of morphological filtering to speechspectrograms for noise robust automatic speech recognition.Speech regions of the spectrogram are identified based on theproximity of high energy regions to neighboring high energyregions in the three-dimensional space.H.Ding, I.Y.Soon, S.N.Koh,C.K.Yeo[4] proposed ahybrid Wiener spectrogram filter (HWSF) for effective noisereduction, followed by a multi-blade post-processor whichexploits the 2D features of the spectrogram to preserve the speechquality and to further reduce the residual noise. Spectrogramcomparisons show that in the proposed scheme, musical noise issignificantly reduced. Cyril Plapous, Claude Marro, and PascalScalart [8] proposed a method called two-step noise reduction28

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011(TSNR) technique which solves reverberation problem whilemaintaining the benefits of the decision-directed approach.However, classic short-time noise reduction techniques, includingTSNR, introduce harmonic distortion in enhanced speech becauseof the unreliability of estimators for small signal-to-noise ratios.To overcome this problem, proposed a method called harmonicregeneration noise reduction (HRNR). Nonlinearity is used toregenerate the degraded harmonics of the distorted signal in anefficient way. Spectrogram of noisy speech enhanced by TSNRtechnique and enhanced by HRNR technique. The spectrogramsof clean speech and enhanced by two techniques are compared[5].2. SPECTRAL ANALYSIS OF SPEECH:SPECTROGRAMA spectrogram is a time-varying spectral representation thatshows how the spectral density of a signal varies with time. In thefield of time–frequency signal processing, it is one of the mostpopular quadratic Time-Frequency distribution that represents asignal in a joint time-frequency domain. Also known as spectralwaterfalls, sonograms, voiceprints, or voicegrams, spectrogramsare used to identify phonetic sounds, to analyze the cries ofanimals; they were also used in many other fields includingmusic, sonar/radar, speech processing, seismology, etc. Theinstrument that generates a spectrogram is called a spectrograph.The most common format is a graph with two geometricdimensions: the horizontal axis represents time, the vertical axis isfrequency; a third dimension indicating the amplitude of aparticular frequency at a particular time is represented by theintensity or colour of each point in the image.Spectrograms are usually created in one of two ways:approximated as a filter bank that results from a series of bandpass filters (this was the only way before the advent of moderndigital signal processing), or calculated from the time signal usingthe short-time Fourier transform (STFT) [1,2,4,7]. These twomethods actually form two different quadratic Time-FrequencyDistributions, but are equivalent under some conditions. Creatinga spectrogram using the STFT is usually a digital process.Digitally sampled data, in the time domain, is broken up intochunks, which usually overlap, and Fourier transformed tocalculate the magnitude of the frequency spectrum for each chunk.Each chunk then corresponds to a vertical line in the image; ameasurement of magnitude versus frequency for a specificmoment in time. The spectrums or time plots are then "laid sideby side" to form the image or a three-dimensional surface [5].A spectrogram shown in Figure 1 is created from thespeech waveform. The spectra computed by the Fourier transformare displayed parallel to the vertical or y-axis. The horizontal axisrepresents time. As we move right along the x-axis we shiftforward in time, traversing one spectrum after another.Spectrograms are normally computed and kept in computermemory as a two-dimensional array of acoustic energy values. Fora given spectrogram S, the strength of a given frequencycomponent f at a given time t in the speech signal is representedby the darkness or color of the corresponding pointS (t , f).Figure 1 : Speech SpectrogramThe use of colour to highlights the important features of aspectrogram. In the spectrogram shown in Figure 1 the shades ofred indicates increasing energy along the frequency axis, blue tomean decreasing energy, and yellow and green to mean an energymaximum. Areas which are white do not have enough energy tobe of interest to us3. COMPUTAION OF SPECTROGRAMThe use of spectrogram in speech enhancement is discussed in thispaper.The additive noise model is described by the following equation,(1)Where,is the observed noisy speech,andis the additive background noise.is the clean speechThe observed speech is then divided into overlapping frames oflength of 256 samples in each frame .The amount of overlap isnormally either 50% or 75%. In this paper, 75% overlapping isused throughout. The nth frame can be represented by a columnvector described by the following equation:All indices used in this paper starts from zero. A speech block canbe obtained by arranging a number of frames together to form amatrix. Suitable numbers of frames are found experimentally to be8, 16 and 32.In this paper, the number of frames used is 16throughout. Similarly each block overlaps its neighboring blockby 75%. Then the speech block can be represented mathematicallyas a matrix, of size 256 by 16 as shown in the following equation:29

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011This signal is windowed using Hamming window. Then thetransform can be applied onto the speech block.3.1 Using DFTDiscrete Fourier Transform (DFT) can be computed efficientlyusing a fast Fourier transform (FFT) algorithm. The discreteFourier transform (DFT) is a specific kind of Fourier transform,used in Fourier analysis. It transforms the time domain functioninto frequency domain representation. FFT algorithms are socommonly employed to compute DFTs that the term FFT is oftenused to mean DFT in colloquial settings.Table 1: Details of speech utterancesFileNameGenderSentence TextSp01MaleThe birch canoe slid on the smoothplanks.Sp05MaleWipe the grease off his dirty face.Sp06MaleMen strive but seldom get richSp19FemaleWe talked of the sideshow in the circus.DFT can be defined as,For length N input vector x, the DFT is a length N vector,where,3.2 Using DCTA Discrete Cosine Transform (DCT) expresses a sequence offinitely many data points in terms of a sum of cosine functionsoscillating at different frequencies. It turns out that cosinefunctions are much more efficient as fewer terms are needed toapproximate a typical signal. In particular, a DCT is a Fourierrelated transform similar to the discrete Fourier transform (DFT),but using only real numbers. DCTs are equivalent to DFTs ofroughly twice the length, operating on real data with evensymmetry.4. RESULTS & DISCUSSIONSHere spectrogram is plotted for different utterances of humanspeech male & Female. Also for different noise conditions withdifferent SNRs (Signal to Noise Ratio). The speech utterances areobtained from noiseus database. Different speech utterances usedin this paper are as shown in Table 1. The spectrograms plottedusing 256 point DFT & 256 point DCT are shown in figures 2 to6.Figure 2: Upper plot - clean speech sp01,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT30

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011Figure 3: Upper plot –speech sp05 corrupted by train noiseSNR 5dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCTFigure 4- Upper plot –speech signal sp06 corrupted car noiseSNR 10 dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT31

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011having higher resolution than that plotted using DFT. From thevisual inspection we can see the amount noise available in thespeech signal. Thus the quality of input signal can be inspectedfrom spectrogram. From visual inspection of spectrogram plottedusing DCT we can say that the noise content is more in signalshown in figure 3 compared to 4 & 5. Also the spectrogram infigure 1 shows that the signal is of free of noise. The voiced andunvoiced regions are very well differentiated and the energy atdifferent time instant in particular frequency bin can be observedvery clearly in spectrogram plotted using DCT due to higherresolution. Whereas the in the spectrograms plotted using DFT theenergy content, amount of noise and voiced/unvoiced regiondetection is much difficult. Thus plotting spectrogram using DCTprovides higher resolution plot than plotting by the usual methodusing DFT.6. REFERENCES[1] Zenton Goh, Kah-Chye Tan, and B.T.G. Tan,”Postprocessing Method for Suppressing Musical NoiseGenerated by Spectral Subtraction”, IEEE trans. on Speechand Audio Processing, vol 6, no.3, pgs. 287-292, May 1998.[2] Richard C. Hendriks, Richard Heusdens ,and Jesper Jensen,”An MMSE Estimator for Speech Enhancement Under ACombined Stochastic–Deterministic Speech Model”, IEEEtrans on Speech & Audio Processing,Vol.15,No.2,Feb 2007.[3] Jesper Jensen and John H.L.Hansen, “Speech EnhancementUsing a Constrained Iterative Sinusoidal Model”, IEEEtrans. on Speech and Audio Processing, Vol 9, No.7,pgs.731-740, Oct 2001.[4] H. Ding, ,I. Y. Soon, S.N.Koh, C.K. Yeo, “A spectralfiltering method based on hybrid wiener filters for speechenhancement”, Science Direct, Speech Communication51(2009) pgs. 259–267[5] Nicholas W.D. Evans, John S.Mason and Matt J.Roach,“Noise Compensation using Spectrogram MorphologicalFiltering”, Speech and Image Research Group, Departmentof Electrical and Electronic Engineering University of WalesSwansea, UK.[6] Sharon Gannot, David Burshtein, and Ehud Weinstein,”Iterative and Sequential KalmanFilter-Based SpeechEnhancement Algorithms”, IEEE trans on Speech & AudioProcessing,Vol.6,No.4,July 1998.[7] I.Y.Soon, S.N. Koh,“Speech Enhancement Using 2-DFourier Transform”, IEEE trans. on Speech and AudioProcessing, Vol 11, No.6,pgs. 717-724, Nov 2003.Figure 5 - Upper plot –speech signal sp19 corrupted byairport noise SNR 15dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT[8] Cyril Plapous, Claude Marro, and Pascal Scalart,” ImprovedSignal-to-Noise Ratio Estimation For Speech Enhancement”,IEEE trans. on Audio, Speech & LanguageProcessing,Vol.14,No.6,Nov 2006.5. CONCLUSIONFrom the results shown above we can conclude that thespectrograms plotted using DCT are clearer than the spectrogramsplotted using same point DFT. The spectrogram plot using DCT is32

In speech enhancement the graphical representation of speech is spectrogram plays vital role to examine speech quality. . spectral subtraction, obtained an enhanced speech with spectrogram and observed a significant reduction of the unwanted short stripes. By observation of spectrogram [1] concluded about

Related Documents:

Digital Signal Processing The Short-Time Fourier Transform (STFT) D. Richard Brown III D. Richard Brown III 1 / 14. . Matlab Spectrogram Example Matlab function spectrogram is useful for easily computing STFTs. [s,f,t] spectrogram(x,kaiser(512,2),256,1024,8000); % x lin chirp

If E(t) is the waveform of interest, its spectrogram is: 2 E (,) ( )exp( )ωτ τ ωE tgt i tdt Σ where g(t-τ) is a variable-delay gate function and τis the delay. Without g(t-τ), Σ E(ω,τ) would simply be the spectrum. A mathematically rigorous form of the musical score is the "spectrogram." The spectrogram is a function of ωand τ.

brain sciences Article Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities You Wang 1, Ming Zhang 1, RuMeng Wu 1, Han Gao 1, Meng Yang 2, Zhiyuan Luo 3 and Guang Li 1,* 1 State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang Univers

Some Real Spectrograms Dark regions indicate peaks (formants) in the spectrum. Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu) 12 Why we are bothered about spectrograms Phones and their properties are better observed in spectrogram. Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu) 13

Speech spectrograms are called as wideband or narrowband based on what window length is used. Shorter the window, larger its bandwidth and the spectrogram is a wideband one. Similarly if the window is larger, the spectrogram is called as narrowband. Now, how much window length corresponds to either of these?

Time-Frequency Transformations Time (samples) Time (frame #)) ) Time (frame #) Reframe (e.g. Buffer) To frequency (e.g. FFT) Wavelet scalogram Constant Q transform Basic spectrogram Perceptually-spaced (e.g. Mel, Bark) Spectrogram Easiest to understand and implement More compact for speech & audio applications Best resolution, for non-periodic .

into a Mel-spectrogram. Most speech synthesis systems are designed in a two-step manner: generation of Mel-spectrograms from input texts (i.e., a feature prediction module), followed by synthesis of waveforms with a pre-trained neural vocoder given the Mel-spectrograms (i.e., a waveform generation module) [8-13]. Al-

he American Revolution simulation is designed to teach students about this important period of history by inviting them to relive that event . Over the course of five days, they will recreate some of the experiences of the people who were beginning a new nation . By taking the perspective of a historical character living through the event, students will begin to see that history is so much .