Introduction To Digital Speech Processing

3y ago

30 Views

2 Downloads

3.28 MB

42 Pages

Last View : 3m ago

Last Download : 3m ago

Upload by : Helen France

Report this link

Download PDF

Transcription

Digital Speech Processing—Lecture 1Introduction to DigitalSpeech Processing1

Speech Processing Speech is the most natural form of human-human communications.Speech is related to language; linguistics is a branch of socialscience.Speech is related to human physiological capability; physiology is abranch of medical science.Speech is also related to sound and acoustics, a branch of physicalscience.Therefore, speech is one of the most intriguing signals that humanswork with every day.Purpose of speech processing:– To understand speech as a means of communication;– To represent speech for transmission and reproduction;– To analyze speech for automatic recognition and extraction ofinformation– To discover some physiological characteristics of the talker.2

Why Digital Processing of Speech? digital processing of speech signals (DPSS)enjoys an extensive theoretical andexperimental base developed over the past 75years much research has been done since 1965 onthe use of digital signal processing in speechcommunication problems highly advanced implementation technology(VLSI) exists that is well matched to thecomputational demands of DPSS there are abundant applications that are inwidespread use commercially3

The Speech StackSpeech Applications — coding, synthesis,recognition, understanding, verification,language translation, speed-up/slow-downSpeech Algorithms —speech-silence(background), voiced-unvoiced decision,pitch detection, formant estimationSpeech Representations — temporal,spectral, homomorphic, LPCFundamentals — acoustics, linguistics,pragmatics, speech perception4

Speech Applications We look first at the top of the speechprocessing stack—namelyapplications– speech coding– speech synthesis– speech recognition and understanding– other speech applications5

Speech CodingEncodingspeechxc (t )A-to-DConverterContinuoustime signalx[n ]Analysis/CodingSampledsignaly[n ]yˆ [n ]CompressionTransformedrepresentationdatayˆ [n ]ChannelorMediumBit ˆc c(t()t )6

Speech Coding Speech Coding is the process of transforming aspeech signal into a representation for efficienttransmission and storage of speech– narrowband and broadband wired telephony– cellular communications– Voice over IP (VoIP) to utilize the Internet as a real-timecommunications medium– secure voice for privacy and encryption for nationalsecurity applications– extremely narrowband communications channels, e.g.,battlefield applications using HF radio– storage of speech for telephone answering machines,IVR systems, prerecorded messages7

Demo of Speech Coding Narrowband Speech Coding: 64 kbps PCM 32 kbps ADPCM 16 kbps LDCELP 8 kbps CELP 4.8 kbps FS1016 2.4 kbps LPC10ENarrowband Speech Wideband Speech Coding:Male talker / Female Talker 3.2 kHz – uncoded 7 kHz – uncoded 7 kHz – 64 kbps 7 kHz – 32 kbps 7 kHz – 16 kbpsWideband Speech8

Demo of Audio Coding CD Original (1.4 Mbps) versus MP3-coded at 128 kbps¾ female vocal¾ trumpet selection¾ orchestra¾ baroque¾ guitarCan you determine which is the uncoded and which is thecoded audio for each selection?Audio CodingAdditional Audio Selections9

Audio Coding Female vocal – MP3-128 kbps coded, CDoriginal Trumpet selection – CD original, MP3-128kbps coded Orchestral selection – MP3-128 kbpscoded Baroque – CD original, MP3-128 kbpscoded Guitar – MP3-128 kbps coded, CD original10

Speech rterspeech11

Speech Synthesis Synthesis of Speech is the process ofgenerating a speech signal usingcomputational means for effective humanmachine interactions– machine reading of text or email messages– telematics feedback in automobiles– talking agents for automatic transactions– automatic agent in customer care call center– handheld devices such as foreign languagephrasebooks, dictionaries, crossword puzzlehelpers– announcement machines that provideinformation such as stock quotes, airlinesschedules, weather reports, etc.12

Speech Synthesis Examples Soliloquy from Hamlet: Gettysburg Address: Third Grade Story:1964-lrr2002-tts13

Pattern Matching ProblemsspeechA-to-DConverter speechrecognition speaker ferencePatterns speaker verification word spotting automatic indexing of speech recordings14

Speech Recognition and Understanding Recognition and Understanding of Speech isthe process of extracting usable linguisticinformation from a speech signal in support ofhuman-machine communication by voice– command and control (C&C) applications, e.g., simplecommands for spreadsheets, presentation graphics,appliances– voice dictation to create letters, memos, and otherdocuments– natural language voice dialogues with machines toenable Help desks, Call Centers– voice dialing for cellphones and from PDA’s and othersmall devices– agent services such as calendar entry and update,15address list modification and entry, etc.

Speech Recognition Demos16

Speech Recognition Demos17

Dictation Demo18

Other Speech Applications Speaker Verification for secure access to premises,information, virtual spaces Speaker Recognition for legal and forensic purposes—national security; also for personalized services Speech Enhancement for use in noisy environments, toeliminate echo, to align voices with video segments, tochange voice qualities, to speed-up or slow-downprerecorded speech (e.g., talking books, rapid review ofmaterial, careful scrutinizing of spoken material, etc) potentially to improve intelligibility and naturalness ofspeech Language Translation to convert spoken words in onelanguage to another to facilitate natural languagedialogues between people speaking different languages,i.e., tourists, business people19

DSP/Speech Enabled DevicesInternet AudioDigital CamerasPDAs & StreamingAudio/VideoHearing AidsCell Phones20

Apple iPod stores music in MP3, AAC, MP4,wma, wav, audio formats compression of 11-to-1 for 128 kbpsMP3 can store order of 20,000 songs with30 GB disk can use flash memory to eliminate allmoving memory access can load songs from iTunes store –more than 1.5 billion downloads tens of millions soldMemoryx[n]Computery[n]D-to-Ayc(t)21

One of the Top DSP ApplicationsCellular Phone22

Digital Speech Processing Need to understand the nature of the speechsignal, and how dsp techniques, communicationtechnologies, and information theory methodscan be applied to help solve the variousapplication scenarios described above– most of the course will concern itself with speechsignal processing — i.e., converting one type ofspeech signal representation to another so as touncover various mathematical or practical propertiesof the speech signal and do appropriate processing toaid in solving both fundamental and deep problems ofinterest23

Speech Signal ProductionMessageSourceMIdeaencapsulatedin amessage, age, M,realized as awordsequence, WSWords realizedas a sequenceof (phonemic)sounds, SConventional studies ofspeech science use speechsignals recorded in a soundbooth with little interference ordistortionAcousticPropagationASoundsreceived atthetransducerthroughacousticambient, AElectronicTransductionSpeechWaveformXSignals convertedfrom acoustic toelectric,transmitted,distorted andreceived as XPractical applicationsrequire use of realistic or“real world” speech withnoise and distortions24

Speech Production/Generation Model Message Formulation Æ desire to communicate an idea, a wish, arequest, express the message as a sequence of wordsDesire toCommunicate MessageFormulationText StringI need some stringPlease get me some stringWhere can I buy somestring(Discrete Symbols)Language Code Æ need to convert chosen text string to asequence of sounds in the language that can be understood byothers; need to give some form of emphasis, prosody (tune, melody)to the spoken sounds so as to impart non-speech information suchas sense of urgency, importance, psychological state of talker,environmental factors (noise, echo)Text StringLanguageCodeGeneratorPhoneme stringwith prosodyPronunciation (In The Brain)Vocabulary(Discrete Symbols)25

Speech Production/Generation Model Neuro-Muscular Controls Æ need to direct the neuro-muscularsystem to move the articulators (tongue, lips, teeth, jaws, velum) soas to produce the desired spoken message in the desired mannerPhoneme Stringwith Prosody s control)Vocal Tract System Æ need to shape the human vocal tract systemand provide the appropriate sound sources to create an acousticwaveform (speech) that is understandable in the environment inwhich it is spokenArticulatoryMotionsVocal TractSystemSource control (lungs,diaphragm, chestmuscles)AcousticWaveform(Speech)(Continuous control)26

The Speech SignalBackgroundSignalPitch PeriodUnvoiced Signal (noiselike sound)27

Speech Perception Model The acoustic waveform impinges on the ear (the basilar membrane)and is spectrally analyzed by an equivalent filter bank of the earAcousticWaveform (Continuous Control)NeuralTransductionSound ontrol)The brain decodes the feature stream into sounds, words andsentencesSound Features SpectralRepresentationThe signal from the basilar membrane is neurally transduced andcoded into features that can be decoded by the brainSpectralFeatures ords, andSentences(Discrete Message)The brain determines the meaning of the words via a messageunderstanding mechanismPhonemes,Words andSentencesMessageUnderstandingBasic Message(Discrete Message)28

The Speech ChainPhonemes, Prosody Articulatory MotionsTextMessageFormulationLanguageCodeDiscrete Input50 bps200 bpsNeuro-MuscularControlsVocal TractSystem2000 bps30-50kbpsTransmissionChannelInformation ontinuous screte nContinuous Output29

The Speech Chain30

Speech Sciences Linguistics: science of language, including phonetics,phonology, morphology, and syntax Phonemes: smallest set of units considered to be thebasic set of distinctive sounds of a languages (20-60units for most languages) Phonemics: study of phonemes and phonemic systems Phonetics: study of speech sounds and their production,transmission, and reception, and their analysis,classification, and transcription Phonology: phonetics and phonemics together Syntax: meaning of an utterance31

The Speech CircleVoice reply to customerCustomer voice request“What number did youwant to call?”Text-to-SpeechSynthesisTTSASRAutomatic SpeechRecognitionDataWhat’s next?Words spoken“Determine correct number”“I dialed a wrong number”DialogManagement(Actions) andSpokenLanguageGeneration(Words)DM &SLGSLUSpoken LanguageUnderstandingMeaning“Billing credit”32

Information Rate of Speech from a Shannon view of information:– message content/information--2**6 symbols(phonemes) in the language; 10 symbols/sec fornormal speaking rate 60 bps is the equivalentinformation rate for speech (issues of phonemeprobabilities, phoneme correlations) from a communications point of view:– speech bandwidth is between 4 (telephone quality)and 8 kHz (wideband hi-fi speech)—need to samplespeech at between 8 and 16 kHz, and need about 8(log encoded) bits per sample for high qualityencoding 8000x8 64000 bps (telephone) to16000x8 128000 bps (wideband)1000-2000 times change in information rate from discrete messagesymbols to waveform encoding can we achieve this three orders ofmagnitude reduction in information rate on real speech waveforms? 33

InformationSourceHuman speaker—lots ofvariabilityMeasurement orObservationAcoustic waveform/articulatorypositions/neural control signalsSignalRepresentationSignalProcessingPurpose ofCourseSignalTransformationExtraction andUtilization ofInformationHuman listeners,machines34

Digital Speech Processing DSP:– obtaining discrete representations of speech signal– theory, design and implementation of numerical procedures(algorithms) for processing the discrete representation in order toachieve a goal (recognizing the signal, modifying the time scaleof the signal, removing background noise from the signal, etc.) Why yreal-time implementations on inexpensive dsp chipsability to integrate with multimedia and dataencryptability/security of the data and the data representationsvia suitable techniques35

Hierarchy of Digital Speech ProcessingRepresentation ofSpeech SignalsWaveformRepresentationspreserve wave shapethrough sampling Parameterspitch, voiced/unvoiced,noise, transientsrepresentsignal asoutput of aspeechproductionmodelVocal TractParametersspectral, articulatory36

Information Rate of SpeechData Rate (Bits Per Second)200,00060,00020,000LDM, PCM, DPCM, m PrintedText(No Source Coding)(Source tions37

Speech Processing ,encryption,secrecy,seamless voiceand dataMessages,IVR, ion,commandandcontrol,agents, NLvoicedialogues,callcenters,help desksReadingsfor theblind,speed-upand slowdown ofspeechratesNoise andechoremoval,alignment ofspeech andtext38

The Speech Stack

Intelligent Robot?http://www.youtube.com/watch?v uvcQCJpZJH840

Speak 4 It (AT&T Labs)Courtesy: Mazin Rahim41

What We Will Be Learning review some basic dsp conceptsspeech production model—acoustics, articulatory concepts, speechproduction modelsspeech perception model—ear models, auditory signal processing,equivalent acoustic processing modelstime domain processing concepts—speech properties, pitch, voicedunvoiced, energy, autocorrelation, zero-crossing ratesshort time Fourier analysis methods—digital filter banks, spectrograms,analysis-synthesis systems, vocodershomomorphic speech processing—cepstrum, pitch detection, formantestimation, homomorphic vocoderlinear predictive coding methods—autocorrelation method, covariancemethod, lattice methods, relation to vocal tract modelsspeech waveform coding and source models—delta modulation, PCM,mu-law, ADPCM, vector quantization, multipulse coding, CELP codingmethods for speech synthesis and text-to-speech systems—physicalmodels, formant models, articulatory models, concatenative modelsmethods for speech recognition—the Hidden Markov Model (HMM)42

Related Documents:

Digital Speech Processing— Lecture 1 - UC Santa Barbara

Lecture 1 Introduction to Digital Speech Processing 2 Speech Processing Speech is the most natural form of human-human communications. Speech is related to language; linguistics is a branch of social science. Speech is related to human physiological capability; physiology is a branch of medical science.

19 Views

1y ago

Digital Speech Processing - UC Santa Barbara

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

126 Views

3y ago

MATLAB Functionality for Digital Speech Processing

The complete set of MATLAB Speech Processing Apps is made available to students and instructors via MATLAB Central, File Exchange, on the MathWorks website, including: -all the code that is required to run the complete set of Speech Processing Apps -an extensive set of speech and audio files for processing

24 Views

1y ago

NONLINEAR COCHLEAR SIGNAL PROCESSING AND …

Springer Handbook on Speech Processing and Speech Communication 1 NONLINEAR COCHLEAR SIGNAL PROCESSING AND MASKING IN SPEECH PERCEPTION Jont B. Allen University of IL Urbana IL 1. INTRODUCTION Auditory masking is critical to our understanding of speech andmusic processing. Thereare manycla

31 Views

2y ago

Speech Therapy (speech) - Medi-Cal

speech 1 Part 2 – Speech Therapy Speech Therapy Page updated: August 2020 This section contains information about speech therapy services and program coverage (California Code of Regulations [CCR], Title 22, Section 51309). For additional help, refer to the speech therapy billing example section in the appropriate Part 2 manual. Program Coverage

111 Views

3y ago

1) Speech articulation and the sounds of speech. 2) The ...

9/8/11! PSY 719 - Speech! 1! Overview 1) Speech articulation and the sounds of speech. 2) The acoustic structure of speech. 3) The classic problems in understanding speech perception: segmentation, units, and variability. 4) Basic perceptual data and the mapping of sound to phoneme. 5) Higher level influences on perception.

127 Views

3y ago

Outline Speech Perception - Nazareth College

1 11/16/11 1 Speech Perception Chapter 13 Review session Thursday 11/17 5:30-6:30pm S249 11/16/11 2 Outline Speech stimulus / Acoustic signal Relationship between stimulus & perception Stimulus dimensions of speech perception Cognitive dimensions of speech perception Speech perception & the brain 11/16/11 3 Speech stimulus

47 Views

1y ago

BAB II Tinjauan Pustaka 2.1 Biaya Pendidikan 2.1.1 ...

Studi Pendidikan Akuntansi secara keseluruhan adalah sebesar Rp4.381.147.409,46. Biaya satuan pendidikan (unit cost) pada Program Studi Akuntansi adalah sebesar Rp8.675.539,42 per mahasiswa per tahun. 2.4 Kerangka Berfikir . Banyaknya aktivitas-aktivitas yang dilakukan Fakultas dalam penyelenggaraan pendidikan, memicu biaya-biaya dalam penyelenggaraan pendidikan. Biaya dalam pendidikan .

70 Views

3y ago

Recent Views

Forex Trading - iniForex

Forex System, 10 Minute Forex Wealth Builder, and Forex Hidden Systems. If you prefer to get a software you can look at . Supra Forex, Forex Multiplier, Turbo Forex Trader or Forex Killer. If you prefer to use an automatic trading system, you can start with . Fap Turbo, Forex Autopilot or Forex Auto Run.

3y ago

2.2K Views

Forex for Beginners: How to Make Money in Forex Trading .

6. The Basic Forex Trading Strategy 7. Forex Trading Risk Management . 8. What You Need to Succeed in Forex 9. Technical Analysis As a Tool for Forex Trading Success . 10. Developing a Forex Strategy and Entry and Exit Signals 11. A Few Trading Tips for Dessert . 1. Making Money in Forex Trading . The Forex market has a daily volume of over 4 .

3y ago

3.4K Views

The Easiest Way to Make Money in Forex

1. Making Money in Forex Trading 2. What is Forex Trading Table of Contents 3. How to Control Losses with "Stop Loss" 4. How to Use Forex for Hedging 5. Advantages of Forex Over Other Investment Assets 6. The Basic Forex Trading Strategy 7. Forex Trading Risk Management 8. What You Need to Succeed in Forex 9.

3y ago

1.5K Views

Forex One Minute Strategy. - avfxtradinghub

forex. There are lots of other factors which will decide the rate of forex. 2. Forex brokers. Second major part of the structure of the forex market is the forex brokers. They are commission agents; they help to bring buyers of forex near to the sellers. Like other industry brokers, they sell or buy the forex on behalf of their customers. They .

1y ago

486 Views

Forex Trading 101 - 'Beginners Forex Trading Introduction Course'

Professional Price Action Forex Trading Strategies Other Tutorials & Guides: How To Correctly Set Up Meta Trader Forex Charting Platform. Part 1: What Is Forex Trading ? - A Definition & Introduction . An Introduction to Forex Trading: Hey traders, This free Forex mini-course is designed to teach you the .

1y ago

868 Views

Presents Trade Forex Responsibly - Forex Crunch

And perhaps it is time to consider another forex system. Forex systems don't work all the time anyway. Trade With a Registered Broker There are a lot of forex brokers out there. The forex industry is quite spread out: there are many players in different countries. Competition is great and some small forex brokers compete with the big boys is .

10m ago

105 Views

The Forex quick guide

The Forex quick guide for beginners and private traders This guide was created by Easy-Forex Trading Platform, and is offered FREE to all Forex traders. Make your Forex learning much more efficient: Register now at Easy-Forex and get FREE 1-on-1 LIVE training, in your language!

3y ago

270 Views

28 Forex Patterns - Asia Forex Mentor

Dec 28, 2020 · Forex patterns cheat sheet 23. Forex candlestick patterns 24. Limitations: 25. Conclusion: Page 3 The 28 Forex Patterns Complete Guide Asia Forex Mentor Chart patterns Chart patterns are formations visually identifiable by the careful study of charts. Completing chart p

2y ago

441 Views

FOREX TRADING (Dasar-Dasar) - Gain Scope

Trading Forex atau Valas adalah BUKAN Judi, karena perdagangan Forex dapat dianalisa secara NYATA, disamping itu Forex juga sama dengan perdagangan pada umumnya dan hanya berbeda di obyeknya saja (di Forex obyeknya adalah mata uang, sedangkan di perdagangan umum obyeknya adalah barang atau jasa). Forex Trading dapat berarti ibarat anda .

1y ago

1.1K Views

Simple-N-Easy Forex - Money Making Forex Tools

Simple-N-Easy Forex 7 Great Simple-N-Easy ways to GROW & SAFEGUARD YOUR money in the Forex market Page 6 Trading records can be based on Demo trading or live trading. So pl ease treat your trading record like gold and with respect. It is your Forex trading mirror which tells you how you are doing. Forex trading is a never ending process of .

1y ago

812 Views

Forex Systems - مرجع آموزش بازار بورس و فارکس

4. The Day Trade Forex System 10 5."Micro Trading" the 1 Minute Chart System 12 6.Tom Demark FX System 13 7.The Forex News Trading System 14 8.The CI System 25 9.Forex Intraday Pivots Trading System 31 Helpful Information for all Forex Trading Systems Building blocks that I believe to be foundations to the Forex Profit System.

1y ago

1.2K Views

Forex 101 L4 - FXN Trading

Forex 101 Lesson 4. How to choose a Forex Broker Forex Broker is the intermediary that facilitates your trading. Although traders prefer to remove the middle-man, a broker forms an important part of trading. In this article we will help you choose forex broker. While most traders tend to take the idea of choosing a forex

10m ago

352 Views

FOREX TRADING FOR BEGINNERS - comparic

Forex trading for beginners – tutorial by Comparic.com 3 This is a forex trading guide for beginners. I try to answer all questions about Forex trading. If you are new to trading or you traded stocks and want to learn more about Forex trading, then this guide is for you.

3y ago

8.7K Views

Forex Trading: The Basics Explained in Simple Terms (Bonus .

explain Forex in a plain and simple manner and give you enough information to get started sooner rather than later, in the exciting world of Forex Trading. What is Forex? Forex is the common term used to describe Foreign Exchange. It is also called currency trading, or just FX trading, and every now and then you may see it referred to as Spot FX.

3y ago

1.1K Views

FOREX TRADING - c.mql5

night. Automated software in the form of a Forex robot can even make this physically possible. However, a cautious trader will choose his times and will not be active during all of the Forex market hours. Forex Margin Trading: Make More Money With Less Forex margin trading is a way of applying leverage to increase the purchasing power of your .

3y ago

370 Views

Introduction To Digital Speech Processing

It looks like you're using an ad-blocker