Multi-Resolution Speech Spectrogram - IJCA

1y ago

8 Views

1 Downloads

783.82 KB

5 Pages

Last View : 8d ago

Last Download : 3m ago

Upload by : Baylee Stein

Report this link

Download PDF

Transcription

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011Multi-Resolution Speech SpectrogramRohini R. MerguLecturerWIT, SolapurABSTRACTAn important aid in analysis & display of speech is soundspectrogram. It represents time-frequency-intensity display ofshort time spectrum. The quality of speech can be studied byvisual inspection of spectrogram. This is one of the importantapplications of spectrogram in speech processing especially inspeech enhancement. Another application of spectrogram is inisolating voiced and unvoiced regions. But to conclude fromvisual inspection the clarity of spectrogram is also important.Before plotting the spectrogram the time domain speech signal isconverted to frequency domain. The transform domain used playsvital role in resolution of spectrogram. Generally Fast FourierTransform is used to convert the time domain signal intofrequency domain signal. This paper discusses the effect of usingdifferent transform for converting the time domain speech signalinto frequency domain before plotting spectrogram. . It isobserved that resolution of speech spectrogram is transformdependent.KeywordsSpectrogram, Speech Enhancement, Speech Processing, Speech &Noise, Speech Quality, SNR, Resolution.1. INTRODUCTIONIn many practical situations, speech has to be recorded in thepresence of undesirable background noise. As noise oftendegrades the quality/intelligibility. In many practical situations,speech has to be recorded in the presence of quality/intelligibility of recorded speech, it is beneﬁcial to carryout noise suppression. In the literature, a variety of speechenhancement methods capable of suppressing noise has beenproposed. In speech enhancement the graphical representation ofspeech is spectrogram plays vital role to examine speech quality.The quality of speech can be observed quickly using spectrogram.This is one of the important applications of spectrogram in speechenhancement. Another application of spectrogram is in isolatingvoiced and unvoiced regions. But to conclude from visualinspection the clarity of spectrogram is also important. Beforeplotting the spectrogram the time domain speech signal isconverted to frequency domain. The transform domain used playsvital role in resolution of spectrogram. Generally Fast FourierTransform is used to convert the time domain signal intofrequency domain signal. This paper discusses the effect of usingdifferent transform for converting the speech signal into frequencydomain before plotting spectrogram.Zenton Goh, Kah-Chye Tan, and B.T.G.Tan [1]examined the spectrograms of typical clean speech, noisy speech,and enhanced speech. The horizontal axis of the spectrogramDr.Shantanu K. DixitProfessor & HeadWIT, Solapurdenotes time, vertical axis frequency, and the spectral magnitudeis shown with gray shade (darker shade indicates larger value). Itis observed that a large portion of the spectrogram is practicallyblank (i.e., unshaded) and the speech energy is concentrated in afew isolated regions. The voiced portion of speech ischaracterized by dark parallel “stripes” whereas unvoiced portionis characterized by gray patches. Some parallel stripes arehorizontal while some are slanting up or down, indicating achange in the pitch of the speech signal. When white Gaussiannoise amounting to the clean speech, the blank region of thespectrogram become shaded, and some of the stripescorresponding to voiced speech disappear. With an appropriatespectral subtraction, obtained an enhanced speech withspectrogram and observed a signiﬁcant reduction of the unwantedshort stripes. By observation of spectrogram [1] concluded aboutspeech quality.S. Gannot, D. Burshtein, and Ehud Weinstein [6]presented a class of Kalman ﬁlter-based algorithms with someextensions, modiﬁcations, and improvements of previous work.The ﬁrst algorithm employs the estimate-maximize (EM) methodto iteratively estimate the spectral parameters of the speech andnoise parameters. The enhanced speech signal is obtained as abyproduct of the parameter estimation algorithm. And used soundspectrogram for comparison of speech quality using Kalman-EMIterative (KEMI) algorithm and log spectral amplitude estimator(LSAE) algorithm. R.C.Hendriks, R.Heusdens, and J. Jensen [2]used a deterministic model in combination with the well-knownstochastic models for speech enhancement. Thus derived aminimum mean-square error(MMSE) estimator under a combinedstochastic–deterministic speech model with speech presenceuncertainty and show that for different distributions of the DFTcoefficients the combined stochastic–deterministic speech modelleads to improved performance and used speech spectrogram forclassification of speech component as deterministic or stochastic.Nicholas W.D. Evans, John S.Mason and Matt J. Roach [5]described the application of morphological ﬁltering to speechspectrograms for noise robust automatic speech recognition.Speech regions of the spectrogram are identiﬁed based on theproximity of high energy regions to neighboring high energyregions in the three-dimensional space.H.Ding, I.Y.Soon, S.N.Koh,C.K.Yeo[4] proposed ahybrid Wiener spectrogram ﬁlter (HWSF) for eﬀective noisereduction, followed by a multi-blade post-processor whichexploits the 2D features of the spectrogram to preserve the speechquality and to further reduce the residual noise. Spectrogramcomparisons show that in the proposed scheme, musical noise issigniﬁcantly reduced. Cyril Plapous, Claude Marro, and PascalScalart [8] proposed a method called two-step noise reduction28

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011(TSNR) technique which solves reverberation problem whilemaintaining the beneﬁts of the decision-directed approach.However, classic short-time noise reduction techniques, includingTSNR, introduce harmonic distortion in enhanced speech becauseof the unreliability of estimators for small signal-to-noise ratios.To overcome this problem, proposed a method called harmonicregeneration noise reduction (HRNR). Nonlinearity is used toregenerate the degraded harmonics of the distorted signal in anefficient way. Spectrogram of noisy speech enhanced by TSNRtechnique and enhanced by HRNR technique. The spectrogramsof clean speech and enhanced by two techniques are compared[5].2. SPECTRAL ANALYSIS OF SPEECH:SPECTROGRAMA spectrogram is a time-varying spectral representation thatshows how the spectral density of a signal varies with time. In thefield of time–frequency signal processing, it is one of the mostpopular quadratic Time-Frequency distribution that represents asignal in a joint time-frequency domain. Also known as spectralwaterfalls, sonograms, voiceprints, or voicegrams, spectrogramsare used to identify phonetic sounds, to analyze the cries ofanimals; they were also used in many other fields includingmusic, sonar/radar, speech processing, seismology, etc. Theinstrument that generates a spectrogram is called a spectrograph.The most common format is a graph with two geometricdimensions: the horizontal axis represents time, the vertical axis isfrequency; a third dimension indicating the amplitude of aparticular frequency at a particular time is represented by theintensity or colour of each point in the image.Spectrograms are usually created in one of two ways:approximated as a filter bank that results from a series of bandpass filters (this was the only way before the advent of moderndigital signal processing), or calculated from the time signal usingthe short-time Fourier transform (STFT) [1,2,4,7]. These twomethods actually form two different quadratic Time-FrequencyDistributions, but are equivalent under some conditions. Creatinga spectrogram using the STFT is usually a digital process.Digitally sampled data, in the time domain, is broken up intochunks, which usually overlap, and Fourier transformed tocalculate the magnitude of the frequency spectrum for each chunk.Each chunk then corresponds to a vertical line in the image; ameasurement of magnitude versus frequency for a specificmoment in time. The spectrums or time plots are then "laid sideby side" to form the image or a three-dimensional surface [5].A spectrogram shown in Figure 1 is created from thespeech waveform. The spectra computed by the Fourier transformare displayed parallel to the vertical or y-axis. The horizontal axisrepresents time. As we move right along the x-axis we shiftforward in time, traversing one spectrum after another.Spectrograms are normally computed and kept in computermemory as a two-dimensional array of acoustic energy values. Fora given spectrogram S, the strength of a given frequencycomponent f at a given time t in the speech signal is representedby the darkness or color of the corresponding pointS (t , f).Figure 1 : Speech SpectrogramThe use of colour to highlights the important features of aspectrogram. In the spectrogram shown in Figure 1 the shades ofred indicates increasing energy along the frequency axis, blue tomean decreasing energy, and yellow and green to mean an energymaximum. Areas which are white do not have enough energy tobe of interest to us3. COMPUTAION OF SPECTROGRAMThe use of spectrogram in speech enhancement is discussed in thispaper.The additive noise model is described by the following equation,(1)Where,is the observed noisy speech,andis the additive background noise.is the clean speechThe observed speech is then divided into overlapping frames oflength of 256 samples in each frame .The amount of overlap isnormally either 50% or 75%. In this paper, 75% overlapping isused throughout. The nth frame can be represented by a columnvector described by the following equation:All indices used in this paper starts from zero. A speech block canbe obtained by arranging a number of frames together to form amatrix. Suitable numbers of frames are found experimentally to be8, 16 and 32.In this paper, the number of frames used is 16throughout. Similarly each block overlaps its neighboring blockby 75%. Then the speech block can be represented mathematicallyas a matrix, of size 256 by 16 as shown in the following equation:29

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011This signal is windowed using Hamming window. Then thetransform can be applied onto the speech block.3.1 Using DFTDiscrete Fourier Transform (DFT) can be computed efficientlyusing a fast Fourier transform (FFT) algorithm. The discreteFourier transform (DFT) is a specific kind of Fourier transform,used in Fourier analysis. It transforms the time domain functioninto frequency domain representation. FFT algorithms are socommonly employed to compute DFTs that the term FFT is oftenused to mean DFT in colloquial settings.Table 1: Details of speech utterancesFileNameGenderSentence TextSp01MaleThe birch canoe slid on the smoothplanks.Sp05MaleWipe the grease off his dirty face.Sp06MaleMen strive but seldom get richSp19FemaleWe talked of the sideshow in the circus.DFT can be defined as,For length N input vector x, the DFT is a length N vector,where,3.2 Using DCTA Discrete Cosine Transform (DCT) expresses a sequence offinitely many data points in terms of a sum of cosine functionsoscillating at different frequencies. It turns out that cosinefunctions are much more efficient as fewer terms are needed toapproximate a typical signal. In particular, a DCT is a Fourierrelated transform similar to the discrete Fourier transform (DFT),but using only real numbers. DCTs are equivalent to DFTs ofroughly twice the length, operating on real data with evensymmetry.4. RESULTS & DISCUSSIONSHere spectrogram is plotted for different utterances of humanspeech male & Female. Also for different noise conditions withdifferent SNRs (Signal to Noise Ratio). The speech utterances areobtained from noiseus database. Different speech utterances usedin this paper are as shown in Table 1. The spectrograms plottedusing 256 point DFT & 256 point DCT are shown in figures 2 to6.Figure 2: Upper plot - clean speech sp01,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT30

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011Figure 3: Upper plot –speech sp05 corrupted by train noiseSNR 5dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCTFigure 4- Upper plot –speech signal sp06 corrupted car noiseSNR 10 dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT31

International Journal of Computer Applications (0975 – 8887)Volume 15– No.4, February 2011having higher resolution than that plotted using DFT. From thevisual inspection we can see the amount noise available in thespeech signal. Thus the quality of input signal can be inspectedfrom spectrogram. From visual inspection of spectrogram plottedusing DCT we can say that the noise content is more in signalshown in figure 3 compared to 4 & 5. Also the spectrogram infigure 1 shows that the signal is of free of noise. The voiced andunvoiced regions are very well differentiated and the energy atdifferent time instant in particular frequency bin can be observedvery clearly in spectrogram plotted using DCT due to higherresolution. Whereas the in the spectrograms plotted using DFT theenergy content, amount of noise and voiced/unvoiced regiondetection is much difficult. Thus plotting spectrogram using DCTprovides higher resolution plot than plotting by the usual methodusing DFT.6. REFERENCES[1] Zenton Goh, Kah-Chye Tan, and B.T.G. Tan,”Postprocessing Method for Suppressing Musical NoiseGenerated by Spectral Subtraction”, IEEE trans. on Speechand Audio Processing, vol 6, no.3, pgs. 287-292, May 1998.[2] Richard C. Hendriks, Richard Heusdens ,and Jesper Jensen,”An MMSE Estimator for Speech Enhancement Under ACombined Stochastic–Deterministic Speech Model”, IEEEtrans on Speech & Audio Processing,Vol.15,No.2,Feb 2007.[3] Jesper Jensen and John H.L.Hansen, “Speech EnhancementUsing a Constrained Iterative Sinusoidal Model”, IEEEtrans. on Speech and Audio Processing, Vol 9, No.7,pgs.731-740, Oct 2001.[4] H. Ding, ,I. Y. Soon, S.N.Koh, C.K. Yeo, “A spectralﬁltering method based on hybrid wiener ﬁlters for speechenhancement”, Science Direct, Speech Communication51(2009) pgs. 259–267[5] Nicholas W.D. Evans, John S.Mason and Matt J.Roach,“Noise Compensation using Spectrogram MorphologicalFiltering”, Speech and Image Research Group, Departmentof Electrical and Electronic Engineering University of WalesSwansea, UK.[6] Sharon Gannot, David Burshtein, and Ehud Weinstein,”Iterative and Sequential KalmanFilter-Based SpeechEnhancement Algorithms”, IEEE trans on Speech & AudioProcessing,Vol.6,No.4,July 1998.[7] I.Y.Soon, S.N. Koh,“Speech Enhancement Using 2-DFourier Transform”, IEEE trans. on Speech and AudioProcessing, Vol 11, No.6,pgs. 717-724, Nov 2003.Figure 5 - Upper plot –speech signal sp19 corrupted byairport noise SNR 15dB,Middle plot –spectrogram plotted using DFT,Lower plot – spectrogram plotted using DCT[8] Cyril Plapous, Claude Marro, and Pascal Scalart,” ImprovedSignal-to-Noise Ratio Estimation For Speech Enhancement”,IEEE trans. on Audio, Speech & LanguageProcessing,Vol.14,No.6,Nov 2006.5. CONCLUSIONFrom the results shown above we can conclude that thespectrograms plotted using DCT are clearer than the spectrogramsplotted using same point DFT. The spectrogram plot using DCT is32

In speech enhancement the graphical representation of speech is spectrogram plays vital role to examine speech quality. . spectral subtraction, obtained an enhanced speech with spectrogram and observed a signiﬁcant reduction of the unwanted short stripes. By observation of spectrogram [1] concluded about

Related Documents:

Digital Signal Processing The Short-Time Fourier Transform (STFT)

Digital Signal Processing The Short-Time Fourier Transform (STFT) D. Richard Brown III D. Richard Brown III 1 / 14. . Matlab Spectrogram Example Matlab function spectrogram is useful for easily computing STFTs. [s,f,t] spectrogram(x,kaiser(512,2),256,1024,8000); % x lin chirp

7 Views

1y ago

The Musical Score, the Fundamental Theorem of Algebra,

If E(t) is the waveform of interest, its spectrogram is: 2 E (,) ( )exp( )ωτ τ ωE tgt i tdt Σ where g(t-τ) is a variable-delay gate function and τis the delay. Without g(t-τ), Σ E(ω,τ) would simply be the spectrum. A mathematically rigorous form of the musical score is the "spectrogram." The spectrogram is a function of ωand τ.

7 Views

11m ago

Silent Speech Decoding Using Spectrogram Features Based …

brain sciences Article Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities You Wang 1, Ming Zhang 1, RuMeng Wu 1, Han Gao 1, Meng Yang 2, Zhiyuan Luo 3 and Guang Li 1,* 1 State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang Univers

22 Views

2y ago

Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis

Some Real Spectrograms Dark regions indicate peaks (formants) in the spectrum. Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu) 12 Why we are bothered about spectrograms Phones and their properties are better observed in spectrogram. Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu) 13

16 Views

2y ago

Introduction to speech analysis using PRAAT - IIT Bombay

Speech spectrograms are called as wideband or narrowband based on what window length is used. Shorter the window, larger its bandwidth and the spectrogram is a wideband one. Similarly if the window is larger, the spectrogram is called as narrowband. Now, how much window length corresponds to either of these?

9 Views

1y ago

AI Techniques Signals Timeseries Text Applications

Time-Frequency Transformations Time (samples) Time (frame #)) ) Time (frame #) Reframe (e.g. Buffer) To frequency (e.g. FFT) Wavelet scalogram Constant Q transform Basic spectrogram Perceptually-spaced (e.g. Mel, Bark) Spectrogram Easiest to understand and implement More compact for speech & audio applications Best resolution, for non-periodic .

8 Views

1y ago

LiteTTS: A Lightweight Mel-Spectrogram-Free Text-to-Wave Synthesizer ...

into a Mel-spectrogram. Most speech synthesis systems are designed in a two-step manner: generation of Mel-spectrograms from input texts (i.e., a feature prediction module), followed by synthesis of waveforms with a pre-trained neural vocoder given the Mel-spectrograms (i.e., a waveform generation module) [8-13]. Al-

8 Views

5m ago

Easy Simulations: American Revolution © Renay Scott ...

he American Revolution simulation is designed to teach students about this important period of history by inviting them to relive that event . Over the course of five days, they will recreate some of the experiences of the people who were beginning a new nation . By taking the perspective of a historical character living through the event, students will begin to see that history is so much .

57 Views

3y ago

Recent Views

AUTOMOTIVE INDUSTRY ANALYSIS REPORT and GUIDE

3.1 General Outlook of the Automotive Industry in the World 7 3.2 Overview of the Automotive Industry in Turkey 10 3.3 Overview of the Automotive Industry in TR42 Region 12 4 Effects of COVID-19 Outbreak on the Automotive Industry 15 5 Trends Specific to the Automotive Industry 20 5.1 Special Trends in the Automotive Industry in the World 20

1y ago

86 Views

Automotive Pathway Automotive Services Fundamentals

Automotive Pathway Automotive Services Fundamentals Course Number: IT11 Prerequisite: None Aligned Industry Credential: S/P2- Safety and Pollution Prevention and SP2- Mechanical and Pollution Prevention Description: This course introduces automotive safety, basic automotive terminology, system & component identification, knowledge and int

2y ago

228 Views

Articulation Agreements: College of Applied Technologies .

Hernando High School FL Automotive . Central Nine Career Center IN Automotive Elkhart Area Career Center IN Automotive . Kokomo Area Career Center IN Automotive North Lawrence Vo-Tech IN MLR Porter County Career Center IN Automotive Richmond High School IN Automotive Southeastern Career

2y ago

376 Views

Automotive Basics - Auto Upkeep

Automotive Basics - Course Description "Automotive Basics includes knowledge of the basic automotive systems and the theory and principles of the components that make up each system and how to service these systems. Automotive Basics includes applicable safety and environmental rules and regulations. In Automotive Basics, students will gain

1y ago

197 Views

Automotive Automotive Automotive - HSBC Bank Malaysia

This Merchant list is subject to change from time to time. Merchant(s) who are terminated from the Instalment program after the published date might still be reflected in this list. HSBC Cardholder(s) are advised to confirm the availability of HSBC Card Instalment Plan with the merchant. Automotive Automotive Automotive

1y ago

173 Views

On the Road: U.S. Automotive Parts Industry Annual Assessment

Table 12: Acquisitions of U.S. Automotive Parts Companies (SIC 3714) Table 13: Automotive Parts Exports, 2000-2010 Table 14: Automotive Parts Imports, 2000-2010 . Automotive parts consumption is linked to the demand for new vehicles, since roughly 70 percent of U.S. automotive parts production is for Original Equipment (OE) products. .

10m ago

72 Views

EMC TEST SYSTEMS FOR AUTOMOTIVE

AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS Step 1 Step 2 Step 3: Set the parameters Step 4: Active test. Load dump pulses have high pulse energy, which can be highly destructive to electrical or electronic equipment. The LD 200N series simulates these pulses with high energy in a range of up to 1.2 seconds. The LD 200N .

3y ago

266 Views

Automotive Manufacturing - Select Georgia

Jobs created by Georgia’s automotive-related locations Toyo Tire North America Manufacturing and expansions in the last three years 32,000 Automotive-related engineers and production workers in Georgia Sources: EMSI 2020.3, press releases and Automotive Database, Georgia Power Community & Economic Development, 2020 Automotive Manufacturing

2y ago

166 Views

#1 OSAT for Automotive Packaging and Test

We Know Automotive Amkor has extensive experience with automotive process requirements shipping billions of units every year for automotive applications. Our packages meet or exceed automotive quality, reliability, burn-in and safe launch plan criteria. Amkor also has failure analysis, tri-temp test and statistical process capability in all .

1y ago

145 Views

Ipsos Automotive Center of Excellence

Global Automotive Center of Excellence -2014 Ipsos Automotive 9 Automotive Center of Excellence As global automotive markets get more sophisticated, they require vehicle manufacturers to offer the most relevant market propositions to match consumer needs. There is greater value than ever before for a global research partner, who understands

1y ago

126 Views

All about automotive engineering in a pocketbook The 8th edition has .

Automotive Automotive Handbook Handbook All about automotive engineering in a pocketbook The 8th edition has been revised and extended. Automotive Handbook Reference handbook for academic and personal use. ISBN 978--7680-4851-3 Contents - central themes Basic principles: physics, materials, machine parts, joining and bonding techniques

1y ago

135 Views

Brochure: Advanced Flash Storage Solutions for Automotive Applications

iNAND Automotive Embedded Flash Drives (EFDs) are designed to support the harsh environments, high reliability and quality required by the automotive industry. The automotive iNAND product portfolio supports both UFS and e.MMC interfaces in a small 11.5x13mm package with a wide range of capacities to provide automotive OEMs and Tier-1

1y ago

161 Views

Industry Skills Forecast and Proposed Schedule of Work Automotive

Executive summary The Automotive Retail, Service and Repair (AUR) and Automotive Manufacturing (AUM) Training Packages are critical elements in the Vocational Education and Training (VET) system, playing central roles in the training of learners that engage in the automotive industries. A productive and valuable Automotive Training

1y ago

134 Views

Automotive Programs Student Handbook - SCCIowa

include a basic knowledge of all facets of the automotive repair industry, followed by classroom practice and drills of basic skills utilized in the automotive repair industry. The curriculum includes an internship experience in an automotive repair business. The curriculum is evaluated and revised as automotive repair needs change in the industry.

10m ago

72 Views

automotIve

automotive manufacturers worldwide. Those companies that take a forward-thinking approach will gain a competitive advantage and secure a leadership position in a realigned automotive value chain. At Seco, we partner with OEMs and other vehicle-based organisations around the globe to help automotive manufacturers overcome their

3y ago

145 Views

Multi-Resolution Speech Spectrogram - IJCA

It looks like you're using an ad-blocker