Digital Speech Processing— Lecture 1 - UC Santa Barbara

1y ago
18 Views
2 Downloads
506.61 KB
7 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Raelyn Goode
Transcription

Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of socialscience. Speech is related to human physiological capability; physiology is abranch of medical science. Speech is also related to sound and acoustics, a branch of physicalscience. Therefore, speech is one of the most intriguing signals that humanswork with every day. Purpose of speech processing:Digital Speech Processing—Lecture 1Introduction to DigitalSpeech Processing– To understand speech as a means of communication;– To represent speech for transmission and reproduction;– To analyze speech for automatic recognition and extraction ofinformation– To discover some physiological characteristics of the talker.12The Speech StackWhy Digital Processing of Speech?Speech Applications — coding, synthesis,recognition, understanding, verification,language translation, speed-up/slow-down digital processing of speech signals (DPSS)enjoys an extensive theoretical andexperimental base developed over the past 75years much research has been done since 1965 onthe use of digital signal processing in speechcommunication problems highly advanced implementation technology(VLSI) exists that is well matched to thecomputational demands of DPSS there are abundant applications that are inwidespread use commerciallySpeech Algorithms —speech-silence(background), voiced-unvoiced decision,pitch detection, formant estimationSpeech Representations — temporal,spectral, homomorphic, LPCFundamentals — acoustics, linguistics,pragmatics, speech perception34Speech ApplicationsSpeech CodingEncoding We look first at the top of the speechprocessing stack—namelyapplicationsspeechxc (t )– speech coding– speech synthesis– speech recognition and understanding– other speech applicationsA-to-DConverterContinuoustime signalx[n ]Analysis/CodingSampledsignaly[n ]yˆ [n ]CompressionTransformedrepresentationdatayˆ [n ]ChannelorMediumBit yˆc c(t()t )61

Speech CodingDemo of Speech Coding Speech Coding is the process of transforming aspeech signal into a representation for efficienttransmission and storage of speech Narrowband Speech Coding: 64 kbps PCM– narrowband and broadband wired telephony– cellular communications– Voice over IP (VoIP) to utilize the Internet as a real-timecommunications medium– secure voice for privacy and encryption for nationalsecurity applications– extremely narrowband communications channels, e.g.,battlefield applications using HF radio– storage of speech for telephone answering machines,IVR systems, prerecorded messages 32 kbps ADPCM 16 kbps LDCELP 8 kbps CELP 4.8 kbps FS1016 2.4 kbps LPC10ENarrowband Speech Wideband Speech Coding:Male talker / Female Talker 3.2 kHz – uncoded 7 kHz – uncoded 7 kHz – 64 kbps 7 kHz – 32 kbps 7 kHz – 16 kbpsWideband Speech7Demo of Audio CodingAudio Coding CD Original (1.4 Mbps) versus MP3-coded at 128 kbps¾ female vocal¾ trumpet selection¾ orchestra¾ baroque¾ guitarCan you determine which is the uncoded and which is thecoded audio for each selection?Audio CodingAdditional Audio Selections89 Female vocal – MP3-128 kbps coded, CDoriginal Trumpet selection – CD original, MP3-128kbps coded Orchestral selection – MP3-128 kbpscoded Baroque – CD original, MP3-128 kbpscoded Guitar – MP3-128 kbps coded, CD original10Speech Synthesis Synthesis of Speech is the process ofgenerating a speech signal usingcomputational means for effective humanmachine interactionsSpeech rterspeech11– machine reading of text or email messages– telematics feedback in automobiles– talking agents for automatic transactions– automatic agent in customer care call center– handheld devices such as foreign languagephrasebooks, dictionaries, crossword puzzlehelpers– announcement machines that provideinformation such as stock quotes, airlinesschedules, weather reports, etc.122

Speech Synthesis ExamplesPattern Matching Problems Soliloquy from Matchingsymbols Gettysburg Address: speechrecognition speaker recognitionReferencePatterns speaker verification Third Grade Story: word spotting1964-lrr2002-tts13Speech Recognition and Understanding Recognition and Understanding of Speech isthe process of extracting usable linguisticinformation from a speech signal in support ofhuman-machine communication by voice automatic indexing of speech recordings14Speech Recognition Demos– command and control (C&C) applications, e.g., simplecommands for spreadsheets, presentation graphics,appliances– voice dictation to create letters, memos, and otherdocuments– natural language voice dialogues with machines toenable Help desks, Call Centers– voice dialing for cellphones and from PDA’s and othersmall devices– agent services such as calendar entry and update,15address list modification and entry, etc.16Dictation DemoSpeech Recognition Demos17183

Other Speech ApplicationsDSP/Speech Enabled Devices Speaker Verification for secure access to premises,information, virtual spaces Speaker Recognition for legal and forensic purposes—national security; also for personalized services Speech Enhancement for use in noisy environments, toeliminate echo, to align voices with video segments, tochange voice qualities, to speed-up or slow-downprerecorded speech (e.g., talking books, rapid review ofmaterial, careful scrutinizing of spoken material, etc) potentially to improve intelligibility and naturalness ofspeech Language Translation to convert spoken words in onelanguage to another to facilitate natural languagedialogues between people speaking different languages,i.e., tourists, business peopleInternet AudioPDAs & StreamingAudio/VideoHearing AidsCell Phones19Apple iPodDigital Cameras20One of the Top DSP Applications stores music in MP3, AAC, MP4,wma, wav, audio formats compression of 11-to-1 for 128 kbpsMP3 can store order of 20,000 songs with30 GB disk can use flash memory to eliminate allmoving memory access can load songs from iTunes store –more than 1.5 billion downloads tens of millions soldMemoryx[n]Computery[n]D-to-Ayc(t)Cellular Phone2122Digital Speech Processing Need to understand the nature of the speechsignal, and how dsp techniques, communicationtechnologies, and information theory methodscan be applied to help solve the variousapplication scenarios described above– most of the course will concern itself with speechsignal processing — i.e., converting one type ofspeech signal representation to another so as touncover various mathematical or practical propertiesof the speech signal and do appropriate processing toaid in solving both fundamental and deep problems ofinterest23Speech Signal ProductionMessageSourceMIdeaencapsulatedin amessage, age, M,realized as awordsequence, WSWords realizedas a sequenceof (phonemic)sounds, SConventional studies ofspeech science use speechsignals recorded in a soundbooth with little interference ordistortionAcousticPropagationASoundsreceived atthetransducerthroughacousticambient, AElectronicTransductionSpeechWaveformXSignals convertedfrom acoustic toelectric,transmitted,distorted andreceived as XPractical applicationsrequire use of realistic or“real world” speech withnoise and distortions244

Speech Production/Generation ModelSpeech Production/Generation Model Message Formulation Æ desire to communicate an idea, a wish, arequest, express the message as a sequence of words Neuro-Muscular Controls Æ need to direct the neuro-muscularsystem to move the articulators (tongue, lips, teeth, jaws, velum) soas to produce the desired spoken message in the desired mannerMessageFormulationDesire toCommunicateText StringI need some stringPlease get me some stringWhere can I buy somestring(Discrete Symbols) Language Code Æ need to convert chosen text string to asequence of sounds in the language that can be understood byothers; need to give some form of emphasis, prosody (tune, melody)to the spoken sounds so as to impart non-speech information suchas sense of urgency, importance, psychological state of talker,environmental factors (noise, echo)Text StringLanguageCodeGeneratorPhoneme stringwith prosody(Discrete Symbols)Pronunciation (In The Brain)VocabularyNeuroMuscularControlsPhoneme Stringwith ProsodyArticulatorymotions(Continuous control) Vocal Tract System Æ need to shape the human vocal tract systemand provide the appropriate sound sources to create an acousticwaveform (speech) that is understandable in the environment inwhich it is spokenArticulatoryMotionsVocal TractSystemAcousticWaveform(Speech)(Continuous control)Source control (lungs,diaphragm, chestmuscles)2526Speech Perception ModelThe Speech Signal The acoustic waveform impinges on the ear (the basilar membrane)and is spectrally analyzed by an equivalent filter bank of the presentation(Continuous Control) The signal from the basilar membrane is neurally transduced andcoded into features that can be decoded by the brainSpectralFeaturesBackgroundSignal27The Speech ChainDiscrete Input50 bps200 rete Message)Phonemes,Words andSentencesMessageUnderstandingBasic Message(Discrete nnelInformation RateSemanticsPhonemes,Words, andSentencesVocal TractSystemContinuous Input2000 bpsLanguageTranslationThe Speech ChainPhonemes, Prosody Articulatory MotionsLanguageCode(Continuous/DiscreteControl) The brain determines the meaning of the words via a messageunderstanding mechanismUnvoiced Signal (noiselike sound)MessageFormulationSound Features(DistinctiveFeatures) The brain decodes the feature stream into sounds, words andsentencesPitch PeriodSound ncesLanguageTranslationDiscrete nContinuous Output29305

The Speech CircleSpeech Sciences Linguistics: science of language, including phonetics,phonology, morphology, and syntax Phonemes: smallest set of units considered to be thebasic set of distinctive sounds of a languages (20-60units for most languages) Phonemics: study of phonemes and phonemic systems Phonetics: study of speech sounds and their production,transmission, and reception, and their analysis,classification, and transcription Phonology: phonetics and phonemics together Syntax: meaning of an utterance31Information Rate of SpeechVoice reply to customerCustomer voice request“What number did youwant to call?”Text-to-SpeechSynthesisTTSASRAutomatic SpeechRecognitionDataWhat’s next?Words spoken“Determine correct number”“I dialed a wrong number”DM &SLGDialogManagement(Actions) andSpokenLanguageGeneration(Words)SLUSpoken LanguageUnderstandingMeaning“Billing credit”32InformationSourceHuman speaker—lots ofvariabilityMeasurement orObservationAcoustic waveform/articulatorypositions/neural control signals from a Shannon view of information:– message content/information--2**6 symbols(phonemes) in the language; 10 symbols/sec fornormal speaking rate 60 bps is the equivalentinformation rate for speech (issues of phonemeprobabilities, phoneme correlations)SignalRepresentation from a communications point of view:– speech bandwidth is between 4 (telephone quality)and 8 kHz (wideband hi-fi speech)—need to samplespeech at between 8 and 16 kHz, and need about 8(log encoded) bits per sample for high qualityencoding 8000x8 64000 bps (telephone) to16000x8 128000 bps (wideband)Purpose ofCourseSignalTransformation1000-2000 times change in information rate from discrete messagesymbols to waveform encoding can we achieve this three orders ofmagnitude reduction in information rate on real speech waveforms? 33Digital Speech ProcessingExtraction andUtilization ofInformationHuman listeners,machines34Hierarchy of Digital Speech ProcessingRepresentation ofSpeech Signals DSP:– obtaining discrete representations of speech signal– theory, design and implementation of numerical procedures(algorithms) for processing the discrete representation in order toachieve a goal (recognizing the signal, modifying the time scaleof the signal, removing background noise from the signal, etc.)WaveformRepresentations Why exibilityaccuracyreal-time implementations on inexpensive dsp chipsability to integrate with multimedia and dataencryptability/security of the data and the data representationsvia suitable techniquespreserve wave shapethrough sampling onParameterspitch, voiced/unvoiced,noise, transientsrepresentsignal asoutput of aspeechproductionmodelVocal TractParametersspectral, articulatory366

Information Rate of SpeechSpeech Processing ApplicationsData Rate (Bits Per Second)200,00060,00020,000LDM, PCM, DPCM, m PrintedText(No Source Coding)(Source tion,secrecy,seamless voiceand dataMessages,IVR, ion,commandandcontrol,agents, NLvoicedialogues,callcenters,help desksReadingsfor theblind,speed-upand slowdown ofspeechratesNoise andechoremoval,alignment ofspeech andtext3738Intelligent Robot?The Speech Stackhttp://www.youtube.com/watch?v uvcQCJpZJH840Speak 4 It (AT&T Labs)Courtesy: Mazin RahimWhat We Will Be Learning review some basic dsp concepts speech production model—acoustics, articulatory concepts, speechproduction models speech perception model—ear models, auditory signal processing,equivalent acoustic processing models time domain processing concepts—speech properties, pitch, voicedunvoiced, energy, autocorrelation, zero-crossing rates short time Fourier analysis methods—digital filter banks, spectrograms,analysis-synthesis systems, vocoders homomorphic speech processing—cepstrum, pitch detection, formantestimation, homomorphic vocoder linear predictive coding methods—autocorrelation method, covariancemethod, lattice methods, relation to vocal tract models speech waveform coding and source models—delta modulation, PCM,mu-law, ADPCM, vector quantization, multipulse coding, CELP coding methods for speech synthesis and text-to-speech systems—physicalmodels, formant models, articulatory models, concatenative models methods for speech recognition—the Hidden Markov Model (HMM)41427

Lecture 1 Introduction to Digital Speech Processing 2 Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of social science. Speech is related to human physiological capability; physiology is a branch of medical science.

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

The complete set of MATLAB Speech Processing Apps is made available to students and instructors via MATLAB Central, File Exchange, on the MathWorks website, including: -all the code that is required to run the complete set of Speech Processing Apps -an extensive set of speech and audio files for processing

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

An Offer from a Gentleman novel tells Sophie’s life in her family and society. Sophie is an illegitimate child of a nobleman having difficulty in living her life. She is forced to work as a servant because her stepmother does not like her. One day, Sophie meets a guy, a son of a nobleman, named Benedict. They fall in love and Sophie asks him to marry her legally. Nevertheless Benedict cannot .