Subvocal Speech Recognition System Based On EMG Signals

1y ago
4 Views
1 Downloads
748.68 KB
5 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Aliana Wahl
Transcription

International Journal of Computer Applications (0975 – 8887)International Conference on Computer Technology (ICCT 2015)Subvocal Speech Recognition System based on EMGSignalsYukti BandiAssistant professorDJ Sanghvi College of Engineering, MumbaiRiddhiSangani , Aayush Shah,AmitPandey, ArunVariaUG Students,DJ Sanghvi College of Engineering , MumbaiABSTRACTThis paper presents results of electromyography (EMG)speech recognition which captures the electric potentials thatare generated by the human articulatory muscles. EMG speechrecognition holds promise for mitigating the effects of highacoustic noise on speech intelligibility in communicationsystems. Few words have been collected from EMG from amale subject, speaking normally and sub vocally. Thecollected signals are then required to be filtered andtransformed into features using Wavelet Packet and statisticalwindowing techniques. Finally, the concept of neural networkwith back propagation method has been used for classificationof data.Using windowed signals and the trained neuralnetwork an arduino operated bot was controlled as anapplication to demonstrate the future scope of the paper. Thesuccess rate was 73%.are recognized by a computer. The signals obtainedfroelectromyography‟ are used as bioelectric signal. TheEMG signals evaluate and record the electrical activityproducedelectrically or neurologically activated. The signalscan be analysed to detect medical abnormalities, activationlevel, or recruitment order or to analyze the biomechanics ofhuman movement [13].Surface EMG (sEMG) assessesmuscle function by recording muscle activity from the surfaceabove the muscle on the skin. For that purpose,more than oneelectrode is needed because EMG recordings display thepotential difference (voltage difference) between two separateelectrodes. Once the signal is acquired from the sEMG,feature extraction will be carried out using time-frequencyrepresentations using Discrete Wavelet Transform.KeywordsThis paper illustrates the techniques and steps employed inthe acquisition, analysis, and how the signal will be processedandclassifiedofsub-vocalspeech.EMG, sub vocal speech, neural network, electromyography1. INTRODUCTIONHuman speech communication typically takes place incomplex acoustic backgrounds with environmental soundsources, competing voices, and ambient noise. The presenceof such noise makes it rather difficult for the human speech toremain robust. In many applications minimum noise withoptimum output is essential. For the purpose of bettercommunication over a range, transmitters and receivers withhigh noise filtering capabilities are developed [1][4].Researchers are working extensively on improving thebioelectric signalling. The goal being to completely eliminatethe noise generated during transmission and reception ofsignals to further the advancement of human-computerinteraction.Sub vocal speech is silent, or sub-auditory, speech, such aswhen a person silently reads or talks to himself [13]. Theelectromyograms of each speaker are different from eachother. So the signals produced vary for every individual. Theaccuracy of the signal can be improved by enhancing patternrecognition characteristics. Sub vocal signals are gatherednon-invasively by attaching a pair of electrodes on the throatand, without opening the mouthor uttering a word. words2. METHODOLOGY2.1 Data AcquisitionThe EMG signals of sub-vocal speech were acquired usingsurface electrodes of Silver Chloride (AgCl) in bipolarconfiguration [3] .These electrodes were kept on the upperpart on the right side and lower part on the left side under thethroat (see Fig 1). Using isopropyl alcohol at 70 % reductionin impedance of the skin was achieved. (to compensate for thelow amplitude of sub-vocal signals), the EMG signals wereamplified by a factor of 1000.2.1.1 Noise ReductionIn the acquisition system to reduce noise and to attenuatefrequencies which are not part of the EMG signal, a band-passchebyshev filter of order 4 is used, formed by a high-passfilter of 25 Hz and a low pass filter of 450 Hz. A preamplifierof gain 100 has been implemented and a post amplifieraccompanied along the filter to obtain gain of about 1000.31

International Journal of Computer Applications (0975 – 8887)International Conference on Computer Technology (ICCT 2015)Fig 3 : .Recorded signal for 20 seconds whenththsubject said forward at 5 second and 15 secondFig 1: Locations of electrodes on the subject2.1.2 Transfer of Data to PCAfter the filtering process is completed, the filtered signal isthen required to be transferred to the computer for furtheranalysis. Various methods have been followed to transfer thedata to pc, they are as follows: Using A/D convertor of PSOC4, using it at 8 bitresolution and in differential mode with sampling rate of166666sps. But when tested it with the filtered emg signal, itdidn‟t give satisfactory results. Another method proposed was to serially get the signalthrough the dso in MATLAB 7.11, but when analyzed, it alsodidn‟t give satisfactory results. But finally the method which gave satisfactory results wasusing a mono jack (one side male and other side striped togive the signal). This mono jack was connected to themicrophone jack in pc. Sound recorder of the pc was used torecord sessions of few seconds. The recorded signal was in .wma format which was converted to .wav file for using itwith MATLAB 7.11 A timer was set for 10, 15, 20 seconds for differentsessions. Following figures shows the recorded signal readand plotted in MATLAB 7.11.See figures 2, 3, 4. In thesefigures X axis represents no. of samples and Y axis representsamplitude. Fig 4 :Recorded signal for 20 seconds whenthsubject said forward at 5 second and reverse at15thsecond2.2 Signal ConditioningAs soon as the sEMG signal is acquired, the conditioning orprocessing of the signal is required to move further. Hereprocessing of the signal refers to activity detection from therecorded emg signal. Activity detection is used to isolate theword said from the continuous emg stream [7]. There arevarious ways for doing that such as: Statistical voice activity detection using low-variancespectrum estimation and an adaptive threshold [10] Voice activity detection using higher order statistics [11]. Statistical voice activity detection using a multipleobservation likelihood ratio test [12]. Sudden change in energy detection using energysensitive windows. The research carried out by NASA reveals that they have usedlast method mentioned above in their basic model and restmethods were mentioned as highly sophisticated methodswhich were reserved for future work.Fig 2 : Recorded signal for 10 secondswhen subject was completely silent32

International Journal of Computer Applications (0975 – 8887)International Conference on Computer Technology (ICCT 2015)threshold value. As the threshold is calculated, hardthresholding to windowed signal was done. The hardthresholding method is defined as[6]𝑥 , 𝑥 𝑈𝑓(6)0 , 𝑥 𝑈𝑓𝑓 𝑥 Thus, after filtering the signal is reconstructed using inversediscrete wavelet transform given by [6]:𝑥𝑛 Fig 5 : Active zone of the sub-vocal signal for the word“Forward”𝑗 𝑍𝑘 𝑍 𝐶𝑗, 𝑘 . 𝜓𝑗 ,𝑘 [𝑛](7)Here x[n] is the reconstructed filtered signal, c represents thethresholding coefficients and Ψ is wavelet bias.2.3 Feature ExtractionThe basic idea behind defining the window is that it can bedefined in such a way that they can detect the sudden changein energy within their domain. Sudden change in energy refersto the presence of a word, as noises are of lower amplitudeand also almost of same amplitude among them. As soon asthe word is uttered the energy for that duration rises as shownin Fig 4. The work of the window is to detect such energychanges, so that the noises can be eliminated and only theword can be extracted (see fig 5).A small window size was selected using energy extraction andthen locating maximum energy to locate the active region ,thesize of window should be small but not that smaller that theactive region gets divided in two halves. The signal energy forwindowing is defined as [6]:𝑊𝑖 1(𝑋(𝑛 1) W i)En 𝜐[n] 𝑁 1𝑖 0 𝑥𝑛 2(𝑖)(2)After locating the active region hard thresholding was used toremove lower energy signal hence, reduce effect of noise.Threshold was found using [6]:U (0.15) Emax(3)Where U is the threshold value and Emax is the peak energy.After locating the active area the windowed signal is thenfiltered using discrete wavelet transform (DWT) which isexpressed by equation no. 4[6]:𝑗Ψ𝑗𝑘 𝑛 22 Ψ 2𝑗 n k j, kϵZ(4)Where j is the scaling factor, k is the translational parameterand Ψ is the wavelet function. In this process the signal isdivided into four parts using mother wavelet daubechies [3].The threshold for the filter can be defined as [6]:Uf 2 log 𝑛In first method feature extraction is carried out by usingdiscrete wavelet packet transform (DWPT) [5][6].DWPT decomposes the averaging coefficients and the detailcoefficients forming a tree structure [1].(1)Energy extraction was done by root sum of squares of samplesin segments equivalent to 100ms which were then averagedusing sliding window and consider 10 energy samples at atime. This gave an active window of one second which wasdetected by taking maximum value of averaged signal andthen spreading window in forward and backward direction.Energy of signal is given by [6] :1𝑁Feature extraction is the process of reducing the size of thedata to facilitate classification process [7]. This process can becarried out in two ways.(5)Where n represents the no. of samples Uf represents theFig 6 : Selecting the Best DWPT Basis [6]To select the optimum basis function or coefficients that bestrepresent the sub-vocal signal, a cost function that measuresthe level of information is used. In this case, Shannon‟sentropy was used as the cost function, namely [3][6]:𝐻 𝑞 𝑘1 𝑞 𝑘 log 𝑞(𝑘)(8)Where H represents the Shannon entropy, q(k) is normalizedenergy of the wavelet coefficients. Once the cost function isfound next step is to choose optimum bias by means ofparameters suggested in[6].Using these results of DWPT, the patterns were extracted bymeans of statistical methods such as root mean square(RMS) etc. Second method uses the discrete wavelettransform which approximates the values of the signal andreduces the size of the signal. On application of DWT thesize of the signal reduces to half. Using repeated applicationof DWT the size can be reduced to adequate level. Usingstatistical methods and DWT process of principalcomponent analysis (PCA) was used to reduce the size ofdata and it is found that the most of the information of thedata is contained in first few elements of the coefficientshence only these coefficients are used for classification [3].33

International Journal of Computer Applications (0975 – 8887)International Conference on Computer Technology (ICCT 2015)2.4 Classification3.3 Signal ConditioningClassification of the data is done by using neural network, amultilayer perceptron neural network with supervised backpropagation. The network was first trained using test datawith no. of nodes equal to selected coefficients as inputnodes, a layer of hidden layer and one output node. Usingthis trained network the signals were tested and efficiency ofeach word was calculated.The recorded signal was then windowed to find the activeregion. The process of windowing was done using energyextraction and it was observed that the signal amplitude above0.6 was detected in the active region. The window of 1 secondwas selected as the active region.3. EXPERIMENTAL RESULTSIn this section, results of the recognition system and theacquired digitized outputs and feature extracted signals havebeen shown.3.1 Classification EffectivenessClassification of the signals was done by Feed ForwardBack Propagation having 4 hidden layers and 361 inputneurons with 1 output neuron. The training set consisted of100 samples with 50 samples of each word. The databasewas constructed by capturing signals from a 22-year-oldmale person in various recording sessions, with conventionalnoise conditions.Table 1 shows the results of the classification phase, it canbe seen that effectiveness of the algorithm varies dependingon the sub-vocal signal being classified, with an averageaccuracy of 75%.Table 1: Effectiveness of the classification processWORDEFFICIENCY (%)Forward74.4Reverse72.5Fig 9 : Windowed signal for forward signal3.4 Feature ExtractionThe windowed signal was passed through Discrete WaveletTransform using „db1‟ harr transform to remove excess noisethand reduce no. of samples. Fig 10 shows the result after 7transform. The no. of reduced samples were 361 samples persignal.3.2 Data AcquisitionThe signal was acquired using the designed circuit and thendigitized using the sound card of computer. The signals wererecorded for span of 10 seconds and then observed onMatlab software. The samples were taken for both vocalspeech and silent lipsing (see figure 7,8).Fig 10 : Signal after 7thtransform3.5 Robot ControlUsing these signals and the trained neural network an arduinooperated bot was controlled as an application to demonstratethe future scope of the topic. As well the signals wereconverted to speech again using prerecorded commands andclassification of word from neural network. Hence showsfuture scope of the topic in bio medical field.Fig 7: Sampled vocal forward signal4. CONCLUSIONThe subvocal speech recognition system uses EMGtechnology to sense the vocal speech signal and controls thedevice in real time. The precise control & accuracy can beachieved by accurate design of pre-amplifier, post amplifier,band pass filter and proper training of neural network.5. FUTURE SCOPEFig 8 : Sampled lipsing reverse signalThe sub vocal speech recognition system is still underresearch by various research institutes. It is developed for thesmall words (or set of words), not for the continuouscommunication. Thus it can be implemented for continuouscommunication by training neural network precisely. The subvocal signals will be transmitted wirelessly to the processingreal-time-recognition system for confidential militarycommunication in noisy acoustic environment. The feature-34

International Journal of Computer Applications (0975 – 8887)International Conference on Computer Technology (ICCT 2015)extraction (and classification) algorithm will be improved toobtain a higher accuracy and precise control.6. REFERENCES[1] Chuck Jorgensen, Diana D. Lee and Shane Agabon,“Sub Auditory Speech Recognition Based on nference on Neural Networks (IJCNN), IEEE, vol. 4,2003, pp. 3128–3133.Threshold Scientific, Manuela Beltran UniversityFoundation, Bogotá, Colombia, pp. 92-98 ,Dic. 2004.[9] Bradley J. Betts, Charles Jorgensen, “Small VocabularyRecognition Using Surface Electromyography in anAcoustically Harsh Environment”, National Aeronauticsand Space Administration (NASA), Ames ResearchCenter Moffett Field, California, 94035-1000, November2005.[3] Control Using EMG Based Sub vocal Speech[10] FA Sepulveda, “Extraction of Speech Signals Parameterstechniques using Time-Frequency Analysis , " NationalUniversity of Colombia , Manizales, Colombia , 2004.[4] Recognition, “Proceedings of the 38th AnnualHawaii International Conference on System Sciences(HICSS), IEEE, 2005, pp. 294c.1–294c.8.[11] Muhammad Zahak Jamal “Signal Acquisition UsingSurface EMG and Circuit Design Considerations forRobotic Prosthesis”.Intech 2012.[5] Luis Enrique Mendoza, Jesús Peña Rodríguez, JairoLenin Ramón Valencia. “Electro-myographic patterns ofsub-vocal Speech: Records and classification” ResearchGroup of GIBUP University The Pamplona,Colombia.November 29, 2013.[12] A. Davis, S. Nordholm, and R. Togneri, “Statistical voiceactivity detection using low-variance spectrumestimation and an adaptive threshold,” IEEETransactionson Speech and Audio Processing, to appear,pp. 1–13.[6] RatnakarMadan, Prof. Sunil Kr. Singh, andNitishaJain,” Signal Filtering Using Discrete WaveletTransform”, published in International Journal of RecentTrends in Engineering, Vol 2, No. 3, November 2009.[13] K. Li, M.N.S. Swamy, and M.O. Ahmad, “An improvedvoice activity detection using higher order essing,vol.13, no. 5, 2005, pp. 965–974.[7] Mark C. Goñi and Alexander P. de la Hoz, "Analysis ofBiomedical Signals Using Wavelet Transform "ContestStudent Jobs EST , National University of SanMartin,Argentina , 2005.[14] .J. Ramírez et al., “Statistical voice activity detectionusing a multiple observation likelihood ratio test,”IEEESignal Processing Letters, vol. 12, no. 10, 2005,pp.689–692.[2] Chuck Jorgensen and Kim Binsted, “Web Browser[8] Dora María Ballesteros Larrotta, "Application of DiscreteWavelet Transform Filtering bioelectric signals,”IJCATM : www.ijcaonline.org35

speech recognition which captures the electric potentials that are generated by the human articulatory muscles. EMG speech . and classified of sub-vocal speech. 1. INTRODUCTION Human speech communication typically takes place in complex acoustic backgrounds with environmental sound sources, competing voices, and ambient noise. .

Related Documents:

speech recognition has acts an important role at present. Using the speech recognition system not only improves the efficiency of the daily life, but also makes people's life more diversified. 1.2 The history and status quo of Speech Recognition The researching of speech recognition technology is started in 1950s. H . Dudley who had

Title: Arabic Speech Recognition Systems Author: Hamda M. M. Eljagmani Advisor: Veton Këpuska, Ph.D. Arabic automatic speech recognition is one of the difficult topics of current speech recognition research field. Its difficulty lies on rarity of researches related to Arabic speech recognition and the data available to do the experiments.

The task of Speech Recognition involves mapping of speech signal to phonemes, words. And this system is more commonly known as the "Speech to Text" system. It could be text independent or dependent. The problem in recognition systems using speech as the input is large variation in the signal characteristics.

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

Speech Recognition Helge Reikeras Introduction Acoustic speech Visual speech Modeling Experimental results Conclusion Introduction 1/2 What? Integration of audio and visual speech modalities with the purpose of enhanching speech recognition performance. Why? McGurk effect (e.g. visual /ga/ combined with an audio /ba/ is heard as /da/)

To reduce the gap between performance of traditional speech recognition systems and human speech recognition skills, a new architecture is required. A system that is capable of incremental learning offers one such solution to this problem. This thesis introduces a bottom-up approach for such a speech processing system, consisting of a novel .

Speech Enhancement Speech Recognition Speech UI Dialog 10s of 1000 hr speech 10s of 1,000 hr noise 10s of 1000 RIR NEVER TRAIN ON THE SAME DATA TWICE Massive . Spectral Subtraction: Waveforms. Deep Neural Networks for Speech Enhancement Direct Indirect Conventional Emulation Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech

appropriate strategies to solve problems. Mathworld.com Classification: Number Theory Diophantine Equations Coin Problem 02-02. 14 AMC 8 Practice Questions Continued -Ms. Hamilton’s eighth-grade class wants to participate intheannualthree-person-teambasketballtournament. The losing team of each game is eliminated from the tournament. Ifsixteenteamscompete, howmanygames will be played to .