Creating Deep Learning Based Speech Products In Record Time

1y ago

35 Views

2 Downloads

9.09 MB

35 Pages

Last View : 2d ago

Last Download : 3m ago

Upload by : Jamie Paz

Report this link

Download PDF

Transcription

Samer HijaziCTOBabbleLabs Inc.

The New World of Speech TechnologyLive 2-wayaudio/video commsAudio/video sharing &distribution8B speakersSpeaker IdentityConsumerbrowsing,shopping &supportProtranscription &forensicsFull language dialogand captureSpeech DeviceControl

The Opportunity 22B microphones by 2020 7B phones radios TVs delivering voice YouTube uploads: 13B minutes per year 200T minutes per year of device interaction 1Q words per year in voice callsSpeech Market Growth: 38.3%-Statista 2018: speech recognition technology market 2016-2014

AI meets speechmore sophisticated models, more data, more training0dBMassiveData CorpusMassiveCompute10s of 1000 hr speech10s of 1,000 hr noise10s of 1000 RIRNEVER TRAIN ONTHE SAME DATATWICE88 TFLOPSPer Engineer21dBSpeech EnhancementSpeech RecognitionSpeech UI Dialog

Technology Product Customers End UsersConsumer audio/video sharer:Recording in the real worldEnd user:their echnologylicenseProduct: Platform-optimized solutionsProduct: Deep learning speechsoftwareSpeech EnhancementTechnology: Unique data-sets andtrainingComputeDataAlgorithms

Clear Speech EverywhereIn production for real-world video sharing, production, streaming, and audioCommon product delivered across platformsOn-Device LicenseWeb UIAndroid & iPhone AppsFoundation toproduct releasein 28 weeks!For visibility and demonstrationPrimary productsCloud API

What is Speech Ennhancment

Human – Human Interface Challenges

Human – Machine Interface Challenges

BabbleLabs Answer to these Challenges: Clear CloudTMNoisyEnhanced

OutlineA bit about noisy speechTraditional speech enhancementDeep neural network approachesClosing thoughts

Acoustic Impairments Modelambient /stationarynoisereverberationnon-stationary noisecompeting talkers

Solutions to Acoustic Impairments speech enhancement source separation beamforming dereverberation

Classes ofSpeech SoundsARPABETPhonetic SymbolsPhonePhonemeVoweliy, ih, eh, ey, ae, aa,aw, ay, ah, ao, oy, ow,uh, uw, ux, er, ax, ix,axr, axhSemivowell, r, w, y, hh, hv, elAffricatejh, chStopsb, d, g, p, t, k, dx, qm, n, ng, em, en, eng,NasalnxFricatives, sh, z, zh, f, th, v, dhproblemspraabcl b l axmz

Speech SpectrumFundamentalFrequency F0(e.g. 150Hz)Formant F1Associated w/size of mouthopening;proportional tofrequencye.g. AA 580HzSpectralCharacteristicsFrequency Range [Hz]FundamentalFrequency, F0Females/Children: 200 to 400Males: 60 to 150HarmonicsUp to 20KHearing Range20 to 20KTypical Audio SamplingRatesIn KHz: 8 (Telephony), 11.025, 22.05 (MP3s),32 (Cassette), 44.1 (CD), 48 (DVD)Formant F2Associated w/changes in oralcavity such astongue positionand lip activityFormant F3Associated w/front vs. backconstriction inoral cavitySine-wave speech: formants are estimatedand used to synthesize speech. HarmonicsExamples generated using Dan Ellis SW /

Noise and Speech LevelsLevel [dB]Classroom,HospitalHome, StoreTrains,AirplanesRestaurantsSpeech SPL60 to 7060 to 7060 to 70Noise SPL50 to 5570 to 7559 to 80SNR 5 to 20-15 to 0-20 to 11SPL: Sound Pressure Level relative to threshold of human hearing(20 micro-Pascals (force per square meter) mosquito flying 3m away)Typical target range for speechenhancement: -5 to 15dB

SirensStrong, structured frequency modulated tones & overtones

Wind NoiseStrong low frequency bursts stationary broad spectrum

CrowdBroad, non-stationary spectrum in speech range

Evaluating Performance of Speech Enhancers Quality measures assess how a speaker produces anutterance. Is the utterance “natural”, “raspy”, “hoarse”, “scratchy”? Does is sound good or bad?Intelligibility measures what a speaker said. What did you understand? What is the word error rate?

Subjective Measures of QualityITU-T P.835 Standard for Speech EnhancementQuality AssessmentRatingSignal Distortion (SIG)Background Distortion(BAK)Overall Quality (OVL)Based on Mean Opinion ScoreRating Scale (MOS)5Very natural, nodegradationNot noticeableExcellent: Imperceptible4Fairly natural, littledegradationSomewhat noticeableGood: Just perceptible, but notannoying3Somewhat natural,somewhat degradedNoticeable but not intrusiveFair: Perceptible and slightlyannoying2Fairly unnatural, fairlydegradedFairly conspicuous,somewhat intrusivePoor: Annoying, but notobjectionable1Very unnatural, verydegradedVery conspicuous, veryintrusiveBad: Very annoying andobjectionable

Objective Measures of Quality and IntelligibilityQualityIntelligibilitySegmental SNR (SNRseg)Frequecy Weighted Segmental SNR (fwSNRseg)Weighted Spectral Slope (WSS)Log-likelihood Ratio (LLR)Itakura-Saito (IS)Cepstral Distance (CEP)Hearing Aid Speech Quality Index (HASQI)Perceptual Evaluation of Video Quality (PEVQ)Perceptual Evaluation of Audio Quality (PEAQ)Perceptual Evaluation of Speech Quality (PESQ)Perceptual Objective Listening Quality Analysis(POLQA)Composite MetricsNormalized Covariance Metrics (NCM)Speech Intelligibility Index (SII)High-energy Glimpse Proportion MetricCoherence and Speech Intelligibility Index (CSII)Quasi-stationary Speech Transmission Index (QSTI)Short-time Objective Intelligibility Measure (STOI)Extended STOI Measure (ESTOI)Hearing-Aid Speech Perception Index (HASPI)K-Nearest Neighbor Mutual Information IntelligibilityMeasure (MIKNN)Speech Intelligibility Prediction based on a MutualInformation Lower Bound (SIMI)Speech Intelligibility in Bits (SIIB)Speech-based Envelop Power Spectrum Model withShort-Time correlation (sEPSM)Automatic Speech Recognition (ASR)Effectiveness of metrics is evaluated by measuringcorrelation of metric predictions against subjective test data

Speech Intelligibility in Bits (SIIB) Measures amount of information between speaker andlistener.Linguistic models for “clean” speech communicationmeasure 50-100 bps typical information rate.Mutual Info between“text” message andclean speechMutual Infobetween cleanand noisespeechFrom: S. Van Kuyk; W. B. Kleijn; R. C.Hendriks; “An instrumentalintelligibility metric based oninformation theory,” in IEEE SignalProcessing Letters, 2018

Traditional Methods of Speech Enhancement Most commonly employ a short-time Fourier transformbased analysis-modification-synthesis framework Frequency dependent noise suppression function Noises suppression based on estimates of speech andnoise statistics

Traditional Methods: Spectral Subtraction𝑅 𝜔 𝑆ถ𝜔 𝐷(𝜔)noisycleannoisespeech speech 𝐷2 E 𝑅 E 𝐷22𝑤ℎ𝑒𝑛 𝑆 0𝑅2 𝑆𝑆መ𝑆መ 𝜔 22 𝐷2 2Re 𝑆𝐷 ignorethis term!! 𝑅2 𝐷2𝑆መ 𝜔exp 𝑗 𝛷𝑟 𝜔noisycleanmagnitudephaseestimateNoisy speech modelNoise magnitude estimate measuredduring period of speech inactivityusing Voice Activity DetectorNoisy speech magnitudeCross term is ignored because cleanspeech and noise are uncorrelatedClean speech magnitude estimateClean speech synthesized from noisyphase and magnitude estimateDifference in noisy and clean phase notperceptible for SNRs 8dB

Spectral Subtraction: Spectrograms

Spectral Subtraction: Waveforms

Deep Neural Networks for Speech amadi, Seyedmahdad, and Ivan Tashev. "Causal SpeechEnhancement Combining Data-Driven Learning and Suppression RuleEstimation." INTERSPEECH. 2016.

Common Ideal Target MasksH. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phasesensitive and recognition-boosted speech separation using deeprecurrent neural networks,” in Proc. Int. Conf. Acoust., Speech, SignalProcess., 2015, pp. 708–712

GeorgiaTech SystemDNN Input Features: 7x noisyspeech frames 1 frame noiseonly of concatenated LogSpectrum Mel Cepstrum withGlobal mean removed &normalized by Global varianceLog SpectrumMagnitudeMelCepstrum ostprocessingnoisy magnitude & biasDerive IRM from speech and noisespectrum estimatesMix DNN output with bias & noisymagnitude according to IRMSynthesisenhancedspeechnoisy phaseSpeech Noise LogSpectrumXu, Yong, et al. "A regression approach to speech enhancementbased on deep neural networks." IEEE/ACM Transactions on Audio,Speech and Language Processing (TASLP) 23.1 (2015): 7-19.

Spectral Subtractive vs. BabbleLabs DNN

Spectral Subtractive vs. BabbleLabs DNNMetricNoisySubtractiveBabbleLabs DNNSNR 1655693SIIB Gauss [bps]

BabbleLabs Production y magnitude & biasSpeech ReSynthesisenhancedspeechnoisy phase 90% of the code in the blue boxes 90% of the compute in the orange box Prototyping is in blocking format, while deployment is in streaming format. Using Matlab and the GPU coder, we were able to covert from reference todeployment code in 6 man-weeks.Currently we are porting the DNN using other open source tools. Exploring the migration to GPU coder to unify the flow if possible.

References Loizou, Philipos C. Speech enhancement: theory and practice. CRC press, 2007.Van Kuyk, Steven, W. Bastiaan Kleijn, and Richard C. Hendriks. "An evaluation ofintrusive instrumental intelligibility metrics." arXiv preprint arXiv:1708.06027 (2017).Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech EnhancementCombining Data-Driven Learning and Suppression RuleEstimation." INTERSPEECH. 2016.Xu, Yong, et al. "A regression approach to speech enhancement based on deepneural networks." IEEE/ACM Transactions on Audio, Speech and LanguageProcessing (TASLP) 23.1 (2015): 7-19.H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase-sensitive andrecognition-boosted speech separation using deep recurrent neural networks,” inProc. Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 708–712 From ment/ https://looking-to-listen.github.io/

s p e a ky o u rm i n d

Speech Enhancement Speech Recognition Speech UI Dialog 10s of 1000 hr speech 10s of 1,000 hr noise 10s of 1000 RIR NEVER TRAIN ON THE SAME DATA TWICE Massive . Spectral Subtraction: Waveforms. Deep Neural Networks for Speech Enhancement Direct Indirect Conventional Emulation Mirsamadi, Seyedmahdad, and Ivan Tashev. "Causal Speech

Related Documents:

Introducing Deep Learning with MATLAB

Deep Learning: Top 7 Ways to Get Started with MATLAB Deep Learning with MATLAB: Quick-Start Videos Start Deep Learning Faster Using Transfer Learning Transfer Learning Using AlexNet Introduction to Convolutional Neural Networks Create a Simple Deep Learning Network for Classification Deep Learning for Computer Vision with MATLAB

76 Views

1y ago

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

102 Views

1y ago

Deep Neural Network based Speech Enhancement - ViVoLab

Speech enhancement based on deep neural network s SE-DNN: background DNN baseline and enhancement Noise-universal SE-DNN Zaragoza, 27/05/14 3 Speech Enhancement Enhancing Speech enhancement aims at improving the intelligibility and/or overall perceptual quality of degraded speech signals using audio signal processing techniques

35 Views

1y ago

Speech Therapy (speech) - Medi-Cal

speech 1 Part 2 – Speech Therapy Speech Therapy Page updated: August 2020 This section contains information about speech therapy services and program coverage (California Code of Regulations [CCR], Title 22, Section 51309). For additional help, refer to the speech therapy billing example section in the appropriate Part 2 manual. Program Coverage

110 Views

3y ago

Digital Speech Processing - UC Santa Barbara

speech or audio processing system that accomplishes a simple or even a complex task—e.g., pitch detection, voiced-unvoiced detection, speech/silence classification, speech synthesis, speech recognition, speaker recognition, helium speech restoration, speech coding, MP3 audio coding, etc. Every student is also required to make a 10-minute

125 Views

3y ago

1) Speech articulation and the sounds of speech. 2) The ...

9/8/11! PSY 719 - Speech! 1! Overview 1) Speech articulation and the sounds of speech. 2) The acoustic structure of speech. 3) The classic problems in understanding speech perception: segmentation, units, and variability. 4) Basic perceptual data and the mapping of sound to phoneme. 5) Higher level influences on perception.

127 Views

3y ago

Outline Speech Perception - Nazareth College

1 11/16/11 1 Speech Perception Chapter 13 Review session Thursday 11/17 5:30-6:30pm S249 11/16/11 2 Outline Speech stimulus / Acoustic signal Relationship between stimulus & perception Stimulus dimensions of speech perception Cognitive dimensions of speech perception Speech perception & the brain 11/16/11 3 Speech stimulus

46 Views

1y ago

SCHOOL OF ACCOUNTING SCIENCES - Unisa

accounting requirements for preparation of consolidated financial statements. IFRS 10 deals with the principles that should be applied to a business combination (including the elimination of intragroup transactions, consolidation procedures, etc.) from the date of acquisition until date of loss of control. OBJECTIVES/OUTCOMES After you have studied this learning unit, you should be able to .

216 Views

3y ago

Recent Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

OWNER'S GUIDE - NinjaKitchen

auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. please keep these important safeguards in mind when using the . appliance: mportant: make sure that the .

1y ago

285 Views

Consumer Guide Auto Insurance - Tennessee

Auto insurance doesn't cover paying off your loan if your car is damaged and its market value is less than what you owe. Auto dealers and lenders may offer guaranteed auto protection (GAP) insurance for this purpose. Your auto insurance will cover you if you drive into Canada. To drive into Mexico, however, you'll need to buy Mexican auto .

1y ago

199 Views

NAIC Consumer Shopping Tool for Auto Insurance

Whether you are buying auto insurance for the first time, or shopping to be sure you are getting the best deal, you already know how important auto insurance is. By law in most states, if you own a car, you must have some auto insurance. Remember, there is no such thing as a "full coverage" auto insurance policy. Policies are made up of

1y ago

185 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Decision Tree Tutorial by Kardi Teknomo - TAN THIAM HUAT 陳添發

Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car Based on above training data, we can induce a decision tree as the following:

10m ago

84 Views

Broadway towing winchester ky

MO 77 Motors: Rock Hill, SC 7th Avenue Auto Salvage: Fargo, ND 81 Auto Parts & Recycling : Salem, VA 82 Auto Wrecking: Brookfield, OH #9 Truck & Auto Parts (No US Shipping) : Tottenham, ON 97 Auto Wrecking Shull's Towing: Brewster , WA 98 Auto Recyclers: Brooksville, FL 99 Auto Dismantler: Stockton, CA A & A Auto & Truck LLC:

2y ago

465 Views

All about auto insurance - Option Consommateurs

of insurance companies with which they have agreements. Insurance agents: agents work for a specific insurance company. Before you decide to do business with either a broker or an agent, check out prices, the products being proposed and the quality of the service. Buying auto insurance 4 All about auto insurance

1y ago

230 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

A Message from Our President - Fox Valley Corvette

Bob Jass Chev-rolet 630-365-6481 Auto Parts 25% in most cas-es Ron Westphal Chevrolet 630-898-9630 Auto Parts 25% in most cas-es Thomsons Auto Parts 630-879-6363 Auto Parts 10% in most cas-es American Mod-ern Insurance Co. Collector Car Auto Insurance 10% on Collector Auto Polic

2y ago

225 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

A CONSUMER GUIDE TO AUTO INSURANCE - Maryland

AUTO INSURANCE Comparison shopping is the key to getting the most for your insurance dollar . Consumers think nothing of price shopping for televisions, computer tablets or appliances to save 20 or 30, but forget to shop around for auto insurance where hundreds of dollars can be saved . There are more than 150 auto insurers (or

1y ago

147 Views

Auto Insurance Affordability: Countrywide Trends and State Comparisons

Auto Insurance Expenditures as Percent of Median Income 1990s Average 1.93% 2000s Average 1.71% 2010s Average 1.61%. 3 State Rankings Based on the 2018 affordability index, auto insurance was most affordable in Iowa, where households spent 1.02 percent of income on auto insurance. Other states with low expenditure-

1y ago

177 Views

Business Auto Insurance made simple - Allstate

And with our range of innovative insurance and ﬁnancial products, we can help you protect your lifestyle. Personal Auto Insurance Your Choice Auto Featuring: Accident Forgiveness, Safe Driving Bonus Check, Deductible Rewards and New Car Replacement Standard auto Property Insurance House Condo Renters Manufactured home

1y ago

133 Views

Creating Deep Learning Based Speech Products In Record Time

It looks like you're using an ad-blocker