Digital Speech Processing— Lecture 1 - UC Santa Barbara

1y ago

18 Views

2 Downloads

506.61 KB

7 Pages

Last View : 15d ago

Last Download : 3m ago

Upload by : Raelyn Goode

Report this link

Download PDF

Transcription

Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of socialscience. Speech is related to human physiological capability; physiology is abranch of medical science. Speech is also related to sound and acoustics, a branch of physicalscience. Therefore, speech is one of the most intriguing signals that humanswork with every day. Purpose of speech processing:Digital Speech Processing—Lecture 1Introduction to DigitalSpeech Processing– To understand speech as a means of communication;– To represent speech for transmission and reproduction;– To analyze speech for automatic recognition and extraction ofinformation– To discover some physiological characteristics of the talker.12The Speech StackWhy Digital Processing of Speech?Speech Applications — coding, synthesis,recognition, understanding, verification,language translation, speed-up/slow-down digital processing of speech signals (DPSS)enjoys an extensive theoretical andexperimental base developed over the past 75years much research has been done since 1965 onthe use of digital signal processing in speechcommunication problems highly advanced implementation technology(VLSI) exists that is well matched to thecomputational demands of DPSS there are abundant applications that are inwidespread use commerciallySpeech Algorithms —speech-silence(background), voiced-unvoiced decision,pitch detection, formant estimationSpeech Representations — temporal,spectral, homomorphic, LPCFundamentals — acoustics, linguistics,pragmatics, speech perception34Speech ApplicationsSpeech CodingEncoding We look first at the top of the speechprocessing stack—namelyapplicationsspeechxc (t )– speech coding– speech synthesis– speech recognition and understanding– other speech applicationsA-to-DConverterContinuoustime signalx[n ]Analysis/CodingSampledsignaly[n ]yˆ [n ]CompressionTransformedrepresentationdatayˆ [n ]ChannelorMediumBit yˆc c(t()t )61

Speech CodingDemo of Speech Coding Speech Coding is the process of transforming aspeech signal into a representation for efficienttransmission and storage of speech Narrowband Speech Coding: 64 kbps PCM– narrowband and broadband wired telephony– cellular communications– Voice over IP (VoIP) to utilize the Internet as a real-timecommunications medium– secure voice for privacy and encryption for nationalsecurity applications– extremely narrowband communications channels, e.g.,battlefield applications using HF radio– storage of speech for telephone answering machines,IVR systems, prerecorded messages 32 kbps ADPCM 16 kbps LDCELP 8 kbps CELP 4.8 kbps FS1016 2.4 kbps LPC10ENarrowband Speech Wideband Speech Coding:Male talker / Female Talker 3.2 kHz – uncoded 7 kHz – uncoded 7 kHz – 64 kbps 7 kHz – 32 kbps 7 kHz – 16 kbpsWideband Speech7Demo of Audio CodingAudio Coding CD Original (1.4 Mbps) versus MP3-coded at 128 kbps¾ female vocal¾ trumpet selection¾ orchestra¾ baroque¾ guitarCan you determine which is the uncoded and which is thecoded audio for each selection?Audio CodingAdditional Audio Selections89 Female vocal – MP3-128 kbps coded, CDoriginal Trumpet selection – CD original, MP3-128kbps coded Orchestral selection – MP3-128 kbpscoded Baroque – CD original, MP3-128 kbpscoded Guitar – MP3-128 kbps coded, CD original10Speech Synthesis Synthesis of Speech is the process ofgenerating a speech signal usingcomputational means for effective humanmachine interactionsSpeech rterspeech11– machine reading of text or email messages– telematics feedback in automobiles– talking agents for automatic transactions– automatic agent in customer care call center– handheld devices such as foreign languagephrasebooks, dictionaries, crossword puzzlehelpers– announcement machines that provideinformation such as stock quotes, airlinesschedules, weather reports, etc.122

Speech Synthesis ExamplesPattern Matching Problems Soliloquy from Matchingsymbols Gettysburg Address: speechrecognition speaker recognitionReferencePatterns speaker verification Third Grade Story: word spotting1964-lrr2002-tts13Speech Recognition and Understanding Recognition and Understanding of Speech isthe process of extracting usable linguisticinformation from a speech signal in support ofhuman-machine communication by voice automatic indexing of speech recordings14Speech Recognition Demos– command and control (C&C) applications, e.g., simplecommands for spreadsheets, presentation graphics,appliances– voice dictation to create letters, memos, and otherdocuments– natural language voice dialogues with machines toenable Help desks, Call Centers– voice dialing for cellphones and from PDA’s and othersmall devices– agent services such as calendar entry and update,15address list modification and entry, etc.16Dictation DemoSpeech Recognition Demos17183

Other Speech ApplicationsDSP/Speech Enabled Devices Speaker Verification for secure access to premises,information, virtual spaces Speaker Recognition for legal and forensic purposes—national security; also for personalized services Speech Enhancement for use in noisy environments, toeliminate echo, to align voices with video segments, tochange voice qualities, to speed-up or slow-downprerecorded speech (e.g., talking books, rapid review ofmaterial, careful scrutinizing of spoken material, etc) potentially to improve intelligibility and naturalness ofspeech Language Translation to convert spoken words in onelanguage to another to facilitate natural languagedialogues between people speaking different languages,i.e., tourists, business peopleInternet AudioPDAs & StreamingAudio/VideoHearing AidsCell Phones19Apple iPodDigital Cameras20One of the Top DSP Applications stores music in MP3, AAC, MP4,wma, wav, audio formats compression of 11-to-1 for 128 kbpsMP3 can store order of 20,000 songs with30 GB disk can use flash memory to eliminate allmoving memory access can load songs from iTunes store –more than 1.5 billion downloads tens of millions soldMemoryx[n]Computery[n]D-to-Ayc(t)Cellular Phone2122Digital Speech Processing Need to understand the nature of the speechsignal, and how dsp techniques, communicationtechnologies, and information theory methodscan be applied to help solve the variousapplication scenarios described above– most of the course will concern itself with speechsignal processing — i.e., converting one type ofspeech signal representation to another so as touncover various mathematical or practical propertiesof the speech signal and do appropriate processing toaid in solving both fundamental and deep problems ofinterest23Speech Signal ProductionMessageSourceMIdeaencapsulatedin amessage, age, M,realized as awordsequence, WSWords realizedas a sequenceof (phonemic)sounds, SConventional studies ofspeech science use speechsignals recorded in a soundbooth with little interference ordistortionAcousticPropagationASoundsreceived atthetransducerthroughacousticambient, AElectronicTransductionSpeechWaveformXSignals convertedfrom acoustic toelectric,transmitted,distorted andreceived as XPractical applicationsrequire use of realistic or“real world” speech withnoise and distortions244

Speech Production/Generation ModelSpeech Production/Generation Model Message Formulation Æ desire to communicate an idea, a wish, arequest, express the message as a sequence of words Neuro-Muscular Controls Æ need to direct the neuro-muscularsystem to move the articulators (tongue, lips, teeth, jaws, velum) soas to produce the desired spoken message in the desired mannerMessageFormulationDesire toCommunicateText StringI need some stringPlease get me some stringWhere can I buy somestring(Discrete Symbols) Language Code Æ need to convert chosen text string to asequence of sounds in the language that can be understood byothers; need to give some form of emphasis, prosody (tune, melody)to the spoken sounds so as to impart non-speech information suchas sense of urgency, importance, psychological state of talker,environmental factors (noise, echo)Text StringLanguageCodeGeneratorPhoneme stringwith prosody(Discrete Symbols)Pronunciation (In The Brain)VocabularyNeuroMuscularControlsPhoneme Stringwith ProsodyArticulatorymotions(Continuous control) Vocal Tract System Æ need to shape the human vocal tract systemand provide the appropriate sound sources to create an acousticwaveform (speech) that is understandable in the environment inwhich it is spokenArticulatoryMotionsVocal TractSystemAcousticWaveform(Speech)(Continuous control)Source control (lungs,diaphragm, chestmuscles)2526Speech Perception ModelThe Speech Signal The acoustic waveform impinges on the ear (the basilar membrane)and is spectrally analyzed by an equivalent filter bank of the presentation(Continuous Control) The signal from the basilar membrane is neurally transduced andcoded into features that can be decoded by the brainSpectralFeaturesBackgroundSignal27The Speech ChainDiscrete Input50 bps200 rete Message)Phonemes,Words andSentencesMessageUnderstandingBasic Message(Discrete nnelInformation RateSemanticsPhonemes,Words, andSentencesVocal TractSystemContinuous Input2000 bpsLanguageTranslationThe Speech ChainPhonemes, Prosody Articulatory MotionsLanguageCode(Continuous/DiscreteControl) The brain determines the meaning of the words via a messageunderstanding mechanismUnvoiced Signal (noiselike sound)MessageFormulationSound Features(DistinctiveFeatures) The brain decodes the feature stream into sounds, words andsentencesPitch PeriodSound ncesLanguageTranslationDiscrete nContinuous Output29305

The Speech CircleSpeech Sciences Linguistics: science of language, including phonetics,phonology, morphology, and syntax Phonemes: smallest set of units considered to be thebasic set of distinctive sounds of a languages (20-60units for most languages) Phonemics: study of phonemes and phonemic systems Phonetics: study of speech sounds and their production,transmission, and reception, and their analysis,classification, and transcription Phonology: phonetics and phonemics together Syntax: meaning of an utterance31Information Rate of SpeechVoice reply to customerCustomer voice request“What number did youwant to call?”Text-to-SpeechSynthesisTTSASRAutomatic SpeechRecognitionDataWhat’s next?Words spoken“Determine correct number”“I dialed a wrong number”DM &SLGDialogManagement(Actions) andSpokenLanguageGeneration(Words)SLUSpoken LanguageUnderstandingMeaning“Billing credit”32InformationSourceHuman speaker—lots ofvariabilityMeasurement orObservationAcoustic waveform/articulatorypositions/neural control signals from a Shannon view of information:– message content/information--2**6 symbols(phonemes) in the language; 10 symbols/sec fornormal speaking rate 60 bps is the equivalentinformation rate for speech (issues of phonemeprobabilities, phoneme correlations)SignalRepresentation from a communications point of view:– speech bandwidth is between 4 (telephone quality)and 8 kHz (wideband hi-fi speech)—need to samplespeech at between 8 and 16 kHz, and need about 8(log encoded) bits per sample for high qualityencoding 8000x8 64000 bps (telephone) to16000x8 128000 bps (wideband)Purpose ofCourseSignalTransformation1000-2000 times change in information rate from discrete messagesymbols to waveform encoding can we achieve this three orders ofmagnitude reduction in information rate on real speech waveforms? 33Digital Speech ProcessingExtraction andUtilization ofInformationHuman listeners,machines34Hierarchy of Digital Speech ProcessingRepresentation ofSpeech Signals DSP:– obtaining discrete representations of speech signal– theory, design and implementation of numerical procedures(algorithms) for processing the discrete representation in order toachieve a goal (recognizing the signal, modifying the time scaleof the signal, removing background noise from the signal, etc.)WaveformRepresentations Why exibilityaccuracyreal-time implementations on inexpensive dsp chipsability to integrate with multimedia and dataencryptability/security of the data and the data representationsvia suitable techniquespreserve wave shapethrough sampling onParameterspitch, voiced/unvoiced,noise, transientsrepresentsignal asoutput of aspeechproductionmodelVocal TractParametersspectral, articulatory366

Information Rate of SpeechSpeech Processing ApplicationsData Rate (Bits Per Second)200,00060,00020,000LDM, PCM, DPCM, m PrintedText(No Source Coding)(Source tion,secrecy,seamless voiceand dataMessages,IVR, ion,commandandcontrol,agents, NLvoicedialogues,callcenters,help desksReadingsfor theblind,speed-upand slowdown ofspeechratesNoise andechoremoval,alignment ofspeech andtext3738Intelligent Robot?The Speech Stackhttp://www.youtube.com/watch?v uvcQCJpZJH840Speak 4 It (AT&T Labs)Courtesy: Mazin RahimWhat We Will Be Learning review some basic dsp concepts speech production model—acoustics, articulatory concepts, speechproduction models speech perception model—ear models, auditory signal processing,equivalent acoustic processing models time domain processing concepts—speech properties, pitch, voicedunvoiced, energy, autocorrelation, zero-crossing rates short time Fourier analysis methods—digital filter banks, spectrograms,analysis-synthesis systems, vocoders homomorphic speech processing—cepstrum, pitch detection, formantestimation, homomorphic vocoder linear predictive coding methods—autocorrelation method, covariancemethod, lattice methods, relation to vocal tract models speech waveform coding and source models—delta modulation, PCM,mu-law, ADPCM, vector quantization, multipulse coding, CELP coding methods for speech synthesis and text-to-speech systems—physicalmodels, formant models, articulatory models, concatenative models methods for speech recognition—the Hidden Markov Model (HMM)41427

Lecture 1 Introduction to Digital Speech Processing 2 Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of social science. Speech is related to human physiological capability; physiology is a branch of medical science.

Related Documents:

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

99 Views

2y ago

Digital Speech Processing - UC Santa Barbara

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

126 Views

3y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

59 Views

1y ago

MSE 460: Electronic Materials, Devices, and Processing

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

154 Views

2y ago

MATLAB Functionality for Digital Speech Processing

The complete set of MATLAB Speech Processing Apps is made available to students and instructors via MATLAB Central, File Exchange, on the MathWorks website, including: -all the code that is required to run the complete set of Speech Processing Apps -an extensive set of speech and audio files for processing

24 Views

1y ago

【E-book】Texts & Questions of 50 Lectures for TOEFL ...

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

148 Views

2y ago

Partial Differential Equations MSO-203-B - IIT Kanpur

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

38 Views

11m ago

BOOK REVIEW OF JULIA QUINN’S AN OFFER FROM A GENTLEMAN

An Offer from a Gentleman novel tells Sophie’s life in her family and society. Sophie is an illegitimate child of a nobleman having difficulty in living her life. She is forced to work as a servant because her stepmother does not like her. One day, Sophie meets a guy, a son of a nobleman, named Benedict. They fall in love and Sophie asks him to marry her legally. Nevertheless Benedict cannot .

54 Views

3y ago

Recent Views

Dear Members of the Harvard Community,

Life science graduate education at Harvard is comprised of 14 Ph.D. programs of study across four Harvard faculties—Harvard Faculty of Arts and Sciences, Harvard T. H. Chan School of Public Health, Harvard Medical School, and Harvard School of Dental Medicine. These 14 programs make up the Harvard Integrated Life Sciences (HILS).

3y ago

182 Views

Xavier Du Maine, Lara Roach, Perspectives - Harvard University

Sciences at Harvard University Richard A. and Susan F. Smith Campus Center 1350 Massachusetts Avenue, Suite 350 Cambridge, MA 02138 617-495-5315 gsas.harvard.edu Office of Diversity and Minority Affairs minrec@fas.harvard.edu gsas.harvard.edu/diversity Office of Admissions and Financial Aid admiss@fas.harvard.edu gsas.harvard.edu/apply

1y ago

146 Views

PROGRAM ON CRISIS LEADERSHIP - Harvard Kennedy School

Harvard Kennedy School Arnold M. Howitt Harvard Kennedy School Philip B. Heymann Harvard Law School April 2014 An earlier version of this white paper provided background for an expert dialogue on lessons learned from the events of the Boston Marathon bombing that was held at the John F. Kennedy School of Government at Harvard

2y ago

330 Views

Harvard Law School - WordPress

Law & Business, Harvard Law School, and H. Douglas Weaver Professor of Business Law. Harvard Business School. 10.30-10.55h. 13th Lecture "Cross-border Insolvency: the New European Regime". Pedro de Miguel Asensio. Full Professor of Private International Law. UCM. 11.00-12.00h. Round Table. "Latest reforms and tendencies on Insolvency Law".

1y ago

145 Views

Harvard Buildings Emergency Phones Harvard University .

Faculty of Arts and Sciences, Harvard University Class of 2018 LEGEND Harvard Buildings Emergency Phones Harvard University Police Department Designated Pathways Harvard Shuttle Bus Stops l e s R i v e r a C h r YOKE ST YMOR E DRIVE BEACON STREET OXFORD ST VENUE CAMBRIDGE STREET KIRKLAND STREET AUBURN STREET VE MEMORIAL

3y ago

171 Views

THE FIRST CENTURY OF THE AMERICAN . - Princeton

Harvard University Press, 1935) and Harvard College in the Seventeenth Century (Cambridge: Harvard University Press, 1936). Quotes, Founding of Harvard, 168, 449. These works are summarized in Three Centuries of Harvard (Cambridge: Harvard U

2y ago

225 Views

Catherine G. Barrera HARVARD UNIVERSITY

danbjork@fas.harvard.edu HARVARD UNIVERSITY Placement Director: Gita Gopinath GOPINATH@HARVARD.EDU 617-495-8161 Placement Director: Nathan Nunn NNUNN@FAS.HARVARD.EDU 617-496-4958 Graduate Administrator: Brenda Piquet BPIQUET@FAS.HARVARD.EDU 617-495-8927 Office Contact Information Department of Economics

2y ago

363 Views

SEAS Lab Safety Officer Orientation

Kuan ebrandin@harvard.edu akuan@fas.harvard.edu Donhee Ham MD B129, MDB132 Dongwan Ha dha@seas.harvard.edu Lene Hau Cruft 112-116 Danny Kim dannykim@seas.harvard.edu Robert Howe 60 Oxford, 312-317,319-321 Paul Loschak loschak@seas.harvard.edu Evelyn Hu McKay 222,226,232 Kathryn Greenberg greenber@fas.harvard.edu

2y ago

359 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Dangerous Defendants - Yale Law Journal

Law School, Louisiana State University Paul M. Hebert Law Center, Roger Williams University School of Law, Rutgers Law School, Sandra Day O'Connor College of Law, Southern Methodist University Dedman School of Law, University of Georgia School of Law, and University of Utah S.J. Quinney College of Law. For institutional support, I am grateful .

1y ago

169 Views

EMPLOYER GUIDE - Harvard Kennedy School

HARVARD KENNEDY SCHOOL EMPLOYER GUIDE At Harvard Kennedy School, our students are being trained in public policy . Post in the HKS JACK job bank or send to HKS_Career@hks.harvard.edu 2. Browse our resume book to identify a student with the skills and experience you need 3. Visit us and meet our talent HARVARD

2y ago

138 Views

2008-2009 FACT BOOK - Harvard University

Harvard Business School Harvard Medical School Harvard Faculty of Arts and Sciences Harvard School of Public . Publishing Division Joint Center for Housing Studies American Repertory Theatre . WIDE is the Wide-scale Interactive Development for Educators. (5) The Nanoscale Science and Engineering Center is a joint program with M.I.T., U.C.S .

2y ago

364 Views

HARVARD UNIVERSITY 2007-08

Harvard Business School Harvard Medical School Harvard Faculty of Arts and Sciences Harvard School of Public . Publishing Division Joint Center for Housing Studies* American Repertory Theatre . WIDE is the Wide-scale Interactive Development for Educators. (5) The Nanoscale Science and Engineering Center is a joint program with M.I.T., U.C.S .

2y ago

314 Views

ANNA BRADY - Harvard University

Jun 02, 2008 · ANNA BRADY 12 Oxford Street Apt. 9 Cambridge, MA 02138 (617) 495-3108 abrady@jd11.law.harvard.edu EDUCATION HARVARD LAW SCHOOL, Candidate for J.D., June 2011 Activities: Harvard Civil Rights-Civil Liberties Law Review UNIVERSITY OF CHICAGO, B.A. i

2y ago

137 Views

Digital Speech Processing— Lecture 1 - UC Santa Barbara

It looks like you're using an ad-blocker