CALIFORNIA STATE UNIVERSITY, NORTHRIDGE A HARDWARE .

3y ago
53 Views
2 Downloads
1.71 MB
54 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Troy Oden
Transcription

CALIFORNIA STATE UNIVERSITY, NORTHRIDGEA HARDWARE IMPLEMENTATION OFAN ARTIFICIAL NEURAL NETWORKA graduate project submitted in partial fulfillment of the requirementsFor the degree of Master of Science in Electrical EngineeringByJustin Thomas WodarckDecember 2009

The graduate project of Justin Thomas Wodarck is approved:Nagwa Bekir, Ph.D.DateXiyi Hang, Ph.D.DateDeborah van Alphen, Ph.D., ChairDateCalifornia State University, Northridgeii

DEDICATIONI would like to dedicate this report to my mom. I have only been able to accomplish whatI have thanks to her inspiration and sacrifice. She always said “people don’t care howmuch you know, until they know how much you care.” She lived that out each and everyday of her life.Roxanne Wodarck (1954 – 2007)iii

ACKNOWLEDGEMENTI would like to thank the LORD Jesus, my King and Savior, to whom I owe every neuronin my brain and every ounce of my being. He has motivated me to always strive forexcellence and use my talents for love. One of the verses that has helped me during mystudies at CSUN:“The fear of the LORD is the beginning of wisdom;all who follow His precepts have good understanding.To Him belongs eternal praise.” –Psalm 111:10 (NIV)I would like to thank my Dad, whose constant tinkering, obsessive organization, andingenuity gave me the mind and attitude of a true engineer.Special thanks to Dr. Deborah van Alphen, for mentoring me during this project.And most importantly, I would like to thank my beautiful wife Jen, the best wife a guycould ask for and the wonderful mother to our children: Gavin and Makayla. You guysmake me excited to come home at the end of every workday. You make life fun!iv

TABLE OF CONTENTSSignature Page . iiDedication . iiiAcknowledgement ivTable of Contents . vList of Figures viList of Tables and Equations .viiAbstract . viiiIntroduction 1Neural Network Background . 2Single Element 2Neural Network Architectures for Speech Recognition . 3Multilayer Perceptron (MLP) 4Time-Delay Neural Network (TDNN) 5Recurrent Neural Network (RNN) . 7Learning Rules 8Speech Recognition Background . 9Circuit Construction . 14Operation of the Circuit . 19Architecture Testing & Determination . 21Performance Testing . 30Conclusion . . 39References . 40Appendix A 42Appendix B 46v

LIST OF FIGURESFigure 1: Single-Input Neuron . 2Figure 2: Three Layer Neural Network . 4Figure 3: Time-Delay Neural Network Architecture . 6Figure 4: Example of a TDNN for Recognizing /b/, /d/, and /g/. . 7Figure 5: Recurrent Neural Network . 8Figure 6: Spectrogram of the Spoken Word “MATLAB”. . 10Figure 7: Plots for the Spoken Word “Neural Network” . 11Figure 8: Circuit Schematic . 15Figure 9: Picture of Completed Circuit with Microphone Headset . 16Figure 10: Close-up of the Keypad . 17Figure 11: Close-up of the 7-Segment Displays 18Figure 12: HM2007 Pin Layout . 21Figure 13: Frequency Domain for X1 and X2 . 23Figure 14: Spectrogram for Signal X1 . 24Figure 15: Spectrogram for Signal X2 . 24Figure 16: Spectrogram of y3 26Figure 17: Spectrogram of y4 26Figure 18: TDNN Voice Print of “Neural Network” 29Figure 19: Preliminary Recognition Accuracy . 31Figure 20: Spectrogram of “Two” . 32Figure 21: Spectrogram of “B2 Spirit” . 32Figure 22: Recognition Accuracy of Circuit . 34Figure 23: Percentage of Error Codes . 34Figure 24: Percentage of Misclassifications . 35Figure 25: Recognition Accuracy for Varying Word Sets 36Figure 26: Recognition Accuracy of Set #1 Words . 37vi

LIST OF TABLES AND EQUATIONSTable 1: Common Transfer Functions . 3Table 2: Examples of Homophones . 13Table 3: Circuit Parts List . 14Table 4: Error Codes . 20Table 5: New Class Phrases . 33Table 6: Word Lists . 36Equation 1: Single Element Equation 3Equation 2: N-input 3-layer Neural Network Equation . 5Equation 3: N-point STFT Calculation . 27vii

ABSTRACTA HARDWARE IMPLEMENTATION OFAN ARTIFICIAL NEURAL NETWORKByJustin Thomas WodarckMaster of Science in Electrical EngineeringThis graduate project explores speech recognition utilizing an artificial neural networkcircuit. A stand-alone hardware implementation of an unknown architecture neuralnetwork was constructed around the HM2007 Integrated Circuit (IC) manufactured byHualon Microelectronics Corporation. A series of tests were conducted utilizing customcoding in MATLAB to reverse-engineer the architecture of the IC and measure itsparameters. The performance of the completed circuit was tested for recognition accuracywhile changing variables such as total number of classes and word choice. A comparisonof performance between multisyllabic words and homophones was also conducted.viii

IntroductionMany people believe that neural network research and applications died with thepublished work of Minsky and Papert in 1969 [1]. Their research showed that despite allthe initial hype surrounding neural networks, this new mathematical model couldn’t evensolve the basic exclusive-or (XOR) logic gate. What fewer people know is that with theaddition of multiple layers and more complex architectures these limitation in neuralnetworks could not only be overcome, but they could flourish in a variety of applications.Recently, neural networks have found success in a diverse range of uses over numerousfields including: stock market analysis, high performance aircraft autopilots, weaponstarget tracking, telecommunication image and data compression, and speech recognitionto name a few [2]. The goal of this Graduate Project was to explore speech recognitionwith neural networks. The objectives were to: construct a stand-alone hardwareimplementation of an artificial neural network around the HM2007 Integrated Circuit(IC), determine the architecture and learning style used by this IC via experimentation,and test the completed circuit to characterize performance.1

Neural Network BackgroundNeural networks are based on the classification ability and learning processes of thehuman brain. By starting with simple elements and highly interconnecting them, neuralnetworks are able to perform extremely complex pattern classification and functionapproximation. This section describes the starting point for understanding of neuralnetworks and the architectures investigated in this project. A more exhaustivebackground can be found in Neural Network Design [3], and Handbook of NeuralNetworks for Speech Processing [4].Single ElementThe simplest element of a neural network is the single-input neuron. This is the basicbuilding block for neural network design and is shown in Figure 1. The single-inputneuron has 5 scalar values (p, w, b, n, a) and two functions ( , f).Figure 1: Single-Input Neuron [3]The input value, p, is weighted by value, w. The product of these two values is thensummed together ( ) with a bias value, b. The result of the summation is labeled, n. Thevalue, n then goes through a transfer function (f). This transfer function can be any linearor non-linear operation that meets the needs of the system; however a few commonly2

used functions are shown in Table 1 below. The result from the transfer function, a, thenbecomes the single output of the neuron. The resulting equation for a single element isdescribed as𝑎 𝑓 𝑤𝑝 𝑏Table 1: Common Transfer Functions [3]Neural Network Architectures for Speech RecognitionOne goal of this project was to determine the neural network architecture of the HM2007IC used in the circuit construction. No a priori knowledge of the architecture was knownexcept that it utilized neural networks to perform speech recognition. The IC was3

compared against the top three most-commonly used neural network architectures forspeech recognition which are: the Multilayer Perceptron (MLP), the Time-Delay NeuralNetwork (TDNN), and the Recurrent Neural Network (RNN) [4]. Each of thesearchitectures is described in detail below.Multilayer Perceptron (MLP)The Multilayer Perceptron is currently the most widely used neural network [3]. Bytaking the theory for a single-input neuron and extrapolating it for multiple inputs andmultiple layers, the basic single element equation can be expanded from individualscalars to matrix notation. Figure 2 shows the general form of a multiple-input, threelayer neural network.Figure 2: Three Layer Neural Network [3]When expanding to R individual inputs, it can be seen that the input, p, becomes an Rx1column vector designated as p (All matrices will be denoted using bold typeface). Thisinput vector, p, connects to s1 first layer neurons. The weight and bias matrix for the firstlayer is denoted by W1s1,R and b1s1, respectively. Each bias vector becomes an Sx14

column vector and each weight matrix has columns equal to the number of inputs intothat neuron layer and rows equal to the number of neurons in that layer. These matricesare labeled with superscripts to denote the layer. (e.g. W3 indicates the weight matrix forthe 3rd layer which should not be confused with raising the weight matrix to the thirdpower). Functions are also labeled using superscripts in the same fashion. In a singlelayer, different functions can be used for each neuron, so f1 becomes a column vector offunctions to be used by each neuron for layer 1. The resulting equation for an n-input 3layer generic neural network is:𝒂𝟑 𝒇𝟑 𝑾𝟑 𝒇𝟐 𝑾𝟐 𝒇𝟏 𝑾𝟏 𝒑 𝒃𝟏 𝒃𝟐 𝒃𝟑Studies show that a three layer network is able to solve almost any complex taskincluding linearly inseparable problems, and reasonably approximate any function [4].When used in a speech recognition application, the MLP performs feature extraction onthe signal structure and creates a static vector using the signal as a whole as the MLPinput.Time-Delay Neural Network (TDNN)The Time-Delay Neural Network is the simplest architecture that incorporates speechpattern dynamics. The TDNN is very similar to the tapped-delay line concept, where thespeech signal goes through N delay blocks, which divides the signal into N 1 segments.These N 1 segments are temporal slices seconds apart over which a short-time Fouriertransform (STFT) is taken. This gives N 1 spectral vectors which characterize thefrequency content over each segment. These spectral vectors are then weighted andbiased similar to the MLP to determine the network output. An example of the TDNN5

architecture is shown in Figure 3 with more detail in Figure 4. [5] In Figure 3, thenotation is as defined below:Di: ith delay blockWi: ith weight termF: transfer functionFigure 3: Time-Delay Neural Network Architecture [5]6

Figure 4: Example of a TDNN for Recognizing /b/, /d/, and /g/ [5]Recurrent Neural Network (RNN)The Recurrent Neural Network is similar to the TDNN in that it allows temporalclassification of a signal, but with the addition that individual neuron layers cancontinually fold back on themselves creating architecture with the ability for nearlyinfinite memory. The basic architecture consists of M neurons in the input layer, Nneurons in the hidden layer, and P neurons in the output layer. Each time the hidden layeris called, the outputs from the N neurons in the hidden layer at time (t-1) fold back intothe inputs of the hidden layer at time t. Instead of a fixed number of input vectors like theMLP and TDNN architectures, the RNN has the ability to use all previous inputinformation up to the current slice in time. This neural network architecture is often used7

in conjunction with statistical analysis of speech to predict and classify continuouspatterns.Figure 5: Recurrent Neural NetworkLearning RulesEach neural network must be trained with data which then creates the basis forclassifying future data. A learning rule is described as the procedure used to modify theweights, w, and biases, b, in order to successfully classify future information that fallsoutside the initial training data. Training is performed once, and then the weights arebiases are fixed. There are two broad categories to learning rules: supervised, whichmeans the user gives a desired target output for each element of training data, andunsupervised, which means the user gives no target output for the training data and thenetwork classifies itself. Learning rules will be discussed further in the report after theHM2007 IC architecture has been determined.8

Speech Recognition BackgroundSound is simply pressure waves that are detected by our ears and analyzed and classifiedby our brains. The human voice is created by passing air from the lungs through vocalcords. By moving the tongue, cheeks and lips words can be produced. The human voicehas a majority of its energy content between tens of Hertz and 5 kHz, but can beapproximated on the frequency spectrum from 300 Hz to 3400 Hz. Speech from an adultmale usually has a fundamental frequency (defined as the lowest tone produced by thevocal chords) around 85-155 Hz and an adult female from 165 Hz – 255 Hz. [6]Although these fundamental frequencies fall below the 300 Hz lower bound, harmonicsoccur at integer multiples of the fundamental frequency giving the impression of actuallyhearing the fundamental frequency even though it is below the lower bound of theapproximated range [6]. Speech recognition is performed virtually seamlessly by theoriginal neural network, our brain, which processes and classifies sounds and words in avariety of complex environments: with background noise, with words blended together incontinuous speech, with accents, and without regard to who the speaker is.Trying to perform the same tasks with neural networks based in software or hardwarebecomes a difficult undertaking, but once speech recognition is successfully implementedit can be used for controlling applications, performing data entry, interfacing withcomputers, or a host of other ways.Speech is utilized in circuits the following way: the acoustic pressure wave goes througha transducer inside a microphone or telephone and converts it from a pressure wave to anelectrical signal. “A speech-wave is a one-dimensional signal having temporal structure.The signal can be considered a combination of different frequency sine-waves, and its9

acoustical characteristics are determined by the frequency, energy (amplitude), and phaseof each component sine-wave. However, for speech recognition, a speech signal isusually converted to a three-dimensional, time-frequency-energy feature pattern that issimilar to a sound spectrogram.” [4] Figure 6 below shows the spectrogram for thespoken word “MATLAB”. The horizontal axis represents time in units of seconds, andthe vertical axis represents frequency in units of Hz. The amplitude at each frequency andinstance in time is denoted by the coloring which is in units of decibel (dB).Figure 6: Spectrogram of the Spoken Word “MATLAB”The spectrogram is created by dividing the input signal into a series of overlappingsegments and applying a pre-selected window function to each of these segments. AShort-Time Fourier Transform (STFT) is then calculated for each individual segment todetermine the frequency components at each time-slice. The results from the STFT arerecombined with the time sampling information and displayed in the spectrogram. [7,8]The harmonics of the fundamental frequency can be easily seen by the lineated structure10

present in the spectrogram. For this signal, the fundamental frequency was approximately231 Hz. Since this word was spoken by a female, it falls within the appropriate range.Another way to look at a speech signal besides the spectrogram is via two plots: one plotshowing the time-amplitude relationship, and the other plot showing the frequencyamplitude relationship. Although this method is commonly used, the dynamic frequencyversus time variations which show up in the spectrogram cannot be seen using thismethod. In Figure 7 below the word “neural network” was spoken and the upper-boundenvelope of the results are displayed.Figure 7: Plots for the Spoken Word “Neural Network”All speech recognition systems must be designed with certain universal issues in mind.Major issues in the speech recognition field include: noise, disfluences, continuousspeech, speaker variability and homophones. [9]Most simple speech recognition systems cannot make a distinction between the desiredsound signal and the undesired sound signal. Any background noise during training oroperation can severely impact the performance of the speech recognition system.11

Disfluences are parts of human speech that often go unnoticed by people. They are slipsof the tongue, hesitations in speech, and utterances such as “uhh” and “um”. In a speechrecognition system, these parts of speech will try to be classified just like any other partof speech, which can often lead to errors or misclassifications. Natural human speechhappens in a continuous manner where words blend together and are not alwaysseparated by a distinguishable pause. This poses a problem for speech recognitionsystems in determining the boundaries of a word and matching them to the trainedpatterns. Speech recognition systems can be classified into three broad categories:isolated word speech recognition, connected word speech recognition, or continuousspeech recognition. In isolated word speech recognition, each word must have distinctpauses before and after the word. These are usually used for command type applicationswhere relatively short words or “commands” cause some sort of action to happen.Connected word speech recognition is similar to isolated word speech recognition, butthe “word” can be a single word or a phrase of words that fit within the allowable timewindow, and in continuous speech recognition the system recognizes words and phrasesin ordinary spoken language without the user making any adjustments from normalconversation.Speaker variability occurs is multiple ways. There is variability in the same word whenspoken multiple times even by the same person. Each distinct waveform will look slightlydifferent in timing, amplitude and frequency. These variations must be taken into accountto ensure that the criteria for word matching are not so stringent that these variationscause the word to be unknown or misclassified. Speaking conditions also causevariability; for example the spoken word “Eject” used to control a cockpit function will12

sound much different (and the resulting waveform will look much different) when spokenin a non-stressed condition versus a condition of extreme excitement. Also, the mostfamiliar form of speaker variability comes simply from different speakers. The signalfeatures of a spoken word look much different depending on the gender of the speaker,the accent of the speaker, and the age and voice type of the speaker. All these categoriescause extreme variability even in the case of a single spoken word.One other issue with speech recognition are homophones. Homophones are words thatsound alike

background can be found in Neural Network Design [3], and Handbook of Neural Networks for Speech Processing [4]. Single Element The simplest element of a neural network is the single-input neuron. This is the basic building block for neural network design and is shown in Figure 1. The single-input

Related Documents:

Clyne Long, Anne Long, and Aaron Sharp are listed as members of Northridge. Joshua Lindsay is listed as registered agent and a member of Northridge. Bretton Lind is listed as the manager ofNorthridge. Northridge's current status as a business entity is expired. 2

Before Northridge - '85 UBC Ductile Moment Frames covered in one page Those were the days! Steel Moment Frames in '92 AISC Seismic 4 ½ pages total Pre-qualified what came to be known as the "Pre- Northridge" Connection AISC 341-10 20 pages of Moment Frame Requirements AISC 341-10 is 356 pages, including Commentary AISC 358-10 (Pre-qualified connections standard) is another

Northridge United Methodist Church MARCH 2016 Volume 33, Issue 3 Palm Sunday Northridge Unite

Northridge United Methodist Church July 2016 Volume 33, Issue 7 Northridge United Methodist Church

saliency of Northridge's poor graduation rates. It was clear to me, and I think to others in leadership positions at Northridge, that it was going to become more of an issue nationally and that it would also affect us." 1. Brian Cook and Young Kim, "The American College President-2012," (Washington, D.C.:

FORM NUMBER: 25 FORM TITLE : Certificate of Liability Insurance Other Named Insured: Brookdale Senior Living, Inc. Insured location: Brookdale Northridge, 17650 Devonshire St , Northridge, CA 91325 . INSURER AFFORDING COVERAGE: National Union Fire Insurance Company of Pittsburgh NAIC#: 19445

College of Science and Mathematics. TABLE OF CONTENTS. CALIFORNIA STATE UNIVERSITY, NORTHRIDGE. COLLEGE OF SCIENCE AND MATHEMATICS. California State University, Northridge (CSUN) serves as . an engine of creativity, discovery and service for Greater Los Angeles and beyond. Each year, CSUN students set forth, like a network of innovators, in

Statistics Barbara G. Tabachnick California State University, Northridge Linda S. Fidell California State University, Northridge 330 H