Acoustic Characteristics Of American English Vowels

2y ago
6 Views
2 Downloads
1.13 MB
13 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Wade Mabry
Transcription

Acoustic characteristics of American English vowelsJames Hillenbrand,LauraA. Getty,MichaelJ. Clark,and n49008(Received10August1994;revised7 November1994;accepted17 January1995)The purposeof this studywas to replicateand extendthe classicstudyof vowel acousticsbyPetersonandBarney(PB) [J.Acoust.Soc.Am. 24, 175-184 (1952)].Recordingsweremadeof 45men,48 women,and46 childrenproducingthevowels/i,t,e,e,a:,a, ,o,u,u,n,3 /inh-V-d syllables.Formantcontoursfor F1-F4 were measuredfrom LPC spectrausinga custominteractiveeditingtool. For comparisonwith the PB data,formantpatternswere sampledat a time thatwasjudgedbyvisual inspectionto be maximallysteady.Analysisof the ntdataandthoseof PB,bothin termsof averagefrequenciesof F1 andF2, andthedegreeof overlapamongadjacentvowels.As with swerenearlyalwaysidentifiedasthevowelintendedby eremorepoorlyseparatedthanthePB databasedon a staticsampleof theformantpattern.However,thevowelscanbe separatedwith a highdegreeof accuracyif durationandspectralchangeinformationis included.PACS numbers: 43.70.Fq,43.71.Es,43.72.ArINTRODUCTIONbodyof ionandspectralchangeplay an importantrole in vowelThe mostwidelycitedexperimenton the acousticsandperception(e.g.,Ainsworth,1972; Bennett,1968;Di Beneperceptionof vowelsis a surprisinglysimplestudycon- detto, 1989ab; Hillenbrandand Gayreft, 1993b; Jenkinsductedat Bell TelephoneLaboratoriesby PetersonandBaretal., 1983; Nearey, 1989; Nearey and Assmann,1986;ney (1952) shortlyafterthe introductionof the soundspec- Stevens,1959; Strange,1989; Strangeet al., 1983;Tiffany,trograph.PetersonandBarney(PB) recordedtwo repetitions 1953; Whalen, 1989). Other limitationsof the PB databaseof ten vowels in /hVd/ contextspokenby 33 men, 28include: (1) There is no indicationthat subjectswerewomen, and 15 children. Acoustic measurements fromscreenedfor dialect,andvery little is knownaboutthe dianarrow-bandspectraconsistedof formantfrequencies(F llectof eitherthespeakersor thelisteners;(2) listeningresultsF3), notreportedseparatelyfor men,women,andchildtalkThe measurementswere takenat a singletime slicethatwasers;(3) no informationis givenaboutthe ageor genderofjudgedto be "steadystate."The /hVd/ signalswere also the childtalkers;(4) measuresweremadefrom a relativelypresentedto listenersfor identification.The resultsof thesmallgroupof children;(5) thereis nowayto determinethemeasurementstudyshoweda strongrelationshipbetweenthe identifiabilityof individualtokens;(6) measurementreliabilintendedvowel andtheformantfrequencypattern.However,ity wasnotreported;and(7) sincetheoriginalsignalsarenotherewas considerableformantfrequencyvariabilityfromlonger available,the databasecannotbe used to evaluateonespeakerto thenext,andtherewasa substantialdegreeofsignal representationsother than F0 and formantfrequenoverlapin the formantfrequencypatternsamongadjacent cies.vowels. The listeningstudy showedthat the vowels wereThe presentstudyrepresentsan attemptto addressthesehighly identifiable:The overall error rate was 5.6%, and limitations.Recordingswere madeof/hVd/utterancessponearlyall of theerrorsinvolvedconfusionsbetweenadjacentken by a largegroupof men,women,andchildren.Measurevowels.The PB measurementshaveplayeda centralrole in thedevelopmentand testingof theoriesof vowel recognition.Acousticmeasurementsfor the signalsrecordedby PB havebeenwidelydistributedto speechresearchlaboratories(e.g.,Watrous,1991) and havebeenusedin numerousstudiestoevaluate alternativemodels of vowel recognition(e.g.,Nearey,1978; Neareyet aL, 1979; Syrdal,1985; SyrdalandGopal,1986; Nearey,1992; Lippmann,1989; Miller, 1989;Hillenbrandand Gayreft, 1993a).Despitethe widespreaduse of the PB measurements,thereare severalwell recognizedlimitationsto thedatabase.Perhapsthemostimportantlimitation is that the databaseconsi t exclusivelyof acousticmentswere made of vowel duration,F0 contours,and for-mantfrequencycontours.The signalswerealsopresentedtoa panel of listenersfor identification.Finally, discriminantanalysiswas usedto classflythe signalsusingvariouscombinations of the acoustic measurements.I. ACOUSTICANALYSISA. Methods1. TalkersTalkers consistedof 45 men, 48 women, and 46 ten- to12-year-oldchildren(27 boys,19 girls).The majorityof themeasurementstakenat a singletime slice.Durationmeasure-speakers(87%) wereraisedin Michigan'slowerpeninsula,ments were not made, and no information is available aboutprimarily the southeasternand southwesternparts of thestate.The remainderwere primarilyfrom otherareasof thethepatternof spectralchangeovertime.Thereis now a solid3099J. Acoust.Soc. Am. 97 (5), Pt. 1, May 19950001-4966/95/97(5)/3099/13/ 6.00 1995 AcousticalSocietyof America 3099

upper midwest,such as Illinois, Wisconsin,Minnesota,northernOhio, andnorthernIndiana.An extensivescreeningprocedurewasusedto selectthese139 subjectsfrom a largergroup.The most importantpart of the screeningprocedurewas a carefuldialectassessment,focusingespeciallyon subjects' productionof the/a/-/ /distinction. The/a/-/ /distinctionis not maintainedby many speakersof AmericanEnglish,a fact whichwe believed(incorrecfiy,as it turnedout) mightaccountfor the relativelyhighconfusabilityreportedby PB for thispair of vowels.The screeningprocedurebeganwith a 5- to 7-min informal conversationwith one of the experimenters.This conversationwas tape recordedfor later review by an experiencedphonetitian.Subjectsnext read a 128-wordpassagetOO0r 3000that contained several instances of words with/o/and/ /.Subjectswere eliminatedif the phoneticiannotedany systematicdeparturefrom generalAmericanEnglish,or if thespeakerfailed to maintainthe /o/-/ / distinctioneitherinspontaneousspeechor in the 128-wordpassage.Subjectswere also requiredto passa brief task which testedtheirability to discriminate/n/-/ /minimal pairs. In additiontothe dialectassessment,subjectswere eliminatedif they:(1)were non-nativespeakersof English;(2) showedany evidenceof a speech,language,or voicedisorder;(3) showedanyevidenceof a currentrespiratoryinfection;or (4) faileda20-dB pure-tonescreeningat 500, 1000, and 2000 Hz.2. RecordingsAudio recordingswere madeof subjectsreadinglistscontaining12 vowels:The ten vowels recordedby PB(/ij,œ, ,o, ,u,u %a /)plus /e/ and /o/. Also recordedwerefour diphthongsin/h-d/context, andbothvowelsanddiphthongsin isolation.Only resultsfrom the 12/hVd/utteranceswill be describedin thisreport.Subjectsreadfrom oneof 12different randomizationsof a list containingthe words"heed,""hid," "hayed,""head,""had," "hod," "hawed,""hoed," "hood," "who'd," "hud," "heard," "hoyed,""hide," "hewed," and "how'd." Subjectswere given asmuchtimeas neededto practicethe taskanddemonstrateanunderstandingof the pronunciationsthat were expectedforeachkeyword.Recordingsweremadeof severalreadingsofthe list oncethe experimenterwas satisfiedthatthe subjectunderstoodthe task. Once the recordingsessionbegan, theexperimenterdid not auditioneachstimulusand requestadditionalreadingsbasedon the experimenteftsjudgmentofcorrectpronunciation.:An attemptwasmadeto recordatleastthreereadingsof the list. This wasoftennot possibleinthe caseof the children,who took longerto train thanadultsand sometimestired of the task after two readings.The recordingsweremadewith a digitalaudiorecorder(SonyPCM-F1)anda dynamicmicrophone(Shure570-S).One tokenof eachstimulusfrom eachtalkerwas low-passfiltered at 7.2 kHz and digitizedat 16 kHz with 12 bits ofamplituderesolutionon a PDP 11/73computer.Unlesstherewere problemswith recordingfidelity or backgroundnoise,tokensweretakenfrom the subject'sfirstreadingof the list.The gainon an othatthepeakamplitudewasat least80% of the 10-Vdynamicrangeof theA/D, with no peakclipping.3100 J. Acoust.Soc.Am.,Vol.97, No. 5, Pt. 1, May 1995TIMEFIG. 1. Spectralpeakdisplayof the word "heard"spokenby a child.Thedashedverticallinesindicatethe beginningandend of the vowel nucleus.The top panelshowsthe signalafterthe original14-poleLPC analysis,themiddlepanelshowsthe signalafterreanalysiswith 18 poles,andthe bottompanelshowsthe signalafter handeditingwith a customeditingtool.& Acousticmeasurementsa. Voweldurationand "steady-state"times. The starting and endingtimesof vocalicnucleiwere measuredbyhand from high-resolutiongray-scaledigital sonandLehiste,1960).In anattemptto producea datasetcomparableto PB,two experimenters,workingindependently,madea judgmentof steady-statetimefor eachsignal.The measuresweremadewhile viewinga spectralpeakdisplay(Fig. 1) anda grayscale spectrogram.PB provide a very brief descriptionofhow steady-statetimeswerelocated,indicatingonly thatthespectrumwassampled,". followingthe influenceof the/hiandprecedingthe influenceof the/d/, duringwhicha prac-ticallysteadystateis reached"(PetersonandBarney,1952,p. 177).The two experimentersworkedfrom thisbrief description,andfromthetenexamplesshownin Fig. 2 of PB.In additionto the hand measurementsof steady-statetimes,we experimentedwith severalmethodsof determiningsteady-statetimes automaticallythroughan analysisof edited formantcontours. Of the several methods that weretried, the techniquethat seemedto producethe bestresultsdefinedsteadystateas the centerof the sequenceof sevenanalysisframes(56 ms)with theminimumslopein logF2logF1 space(Miller, 1989).b. Formant contours. Formant-frequencyanalysisbegan with the calculation of 14-pole, 128-point linearpredictivecoding(LPC)spectraevery8 msover16 ms(256point)hammingwindowedsegments.The frequenciesof thefirst sevenspectralpeakswere thenextractedfrom the LPCspectrumfiles. The frequenciesof spectralpeakswere estimatedwith a three-pointparabolicinterpolation,yieldingafiner resolutionthan the 61.5-Hz frequencyquantization.Files containingthe LPC peak data servedas the input to acustominteractiveeditor.The editorallowstheexperimenterHillenbrandet al.: Acousticcharacteristicsof vowels 3100

TABLEI. Percentageof utt ranc.½sshowinga formantmergeranywhereinTABLE I1. Average absolute difference between formant frequenciesthevowelnucleus.Shownin parenthesesare thepercentageof utterances sampledat "steady-state"timesdemrminedby two judges.Figuresin pashowinga formantmergerat "steadystate."renthesesare differencesas a percentof Iol/u//u//MI 1F1 (11.3)WomenChildrenOverallF1F2F37.5 (1.3%)14.2 (0.8%)18.6 (0.7)9.2 (1.5%)20.0 (1.1%)21.2 (0.7%)10.7 (1.8%)18.5 (1.1%)27.6 (1.0%)9.2 (1.5%)17.6 (1.0%)22.5 (0.8%)F420.6 (0.5%)31.5 (0.8%)36.4 (0.9%)29.5 (0.7%)For the presentstudy,formantswere edited only betweenthe startingandendingtimesof the vowel. Contoursfor F1-F3 weremeasuredfor all signals,exceptin casesofunresolvableformantmergers.The fourthformantwas measuredonly whena well-definedF4 contourwasclearlyvisible both on the LPC peak displayand the gray-scalespectrogram.The fourthformantwasjudgedto be unmeasurableto reanalyzethe signalwith differentLPC analysisparam-for 15.6% of the utterances.eters and to hand edit the formant tracks.c. Fundamentalfrequencycontours.FO contourswereextractedwith an autocorrelationpitchtracker(Hillenbrand,1988), followedby handeditingusingthe tool describedabove.Grosstrackingerrorssuchas pitchhalvingandpitchdoublingwere correctedby reanalyzingthe signalwith anoptionthatimposesanupperor lowerlimit on thesearchforEditingandanalysisdecisionswerebasedon an examinationof the LPC peak displayoverlaidon a gray-scalespectrogramand,in somecases,on an examinationof individualLPC or oneticsalsoplayeda role in the editingprocess.For the experimenter'sknowledgeof the closeproximityof F2andF3 for vowelssuchas/i/and/s,/, thecloseproximityofF1 and F2 for vowels suchas/a/and/u/, and so on (seeLadefoged,1967,for an excellentdiscussionof theinherentcircularityin thismethodof estimatingvowelformants,andfor otherinsightfulcommentson theformantanalysis).Considerationssuchas theseoftenled the experimenterto conclude that a formant mergeroccurred.In thesecases,theLPC spectrawererecomputedwith a largernumberof polesuntil the mergedformantsseparated.Oncethe experimenterwas satisfiedwith the analysis,editingcommandscould be usedto hand edit any formanttrackingerrorsthatremained.Figure1 showsan exampleoftheutterance"heard"spokenby a ten-year-oldboy:(a) afterthe original14-poleanalysis,(b) after reanalysiswith 18poles,and(c) afterhand-editing.(For simplicity,the grayscalespectrogramunderlyingthepeakdisplayis notshown.)The vertical lines indicatethe beginningand end of thevowelnucleus.Two commandsareavailablefor xperimenterto use the mouseto deletea spuriouspeak, and a secondcommandallowstheexperimenterto usethe mouseto interpolatethrough"holes"in theformantcontour.For example,in thecenterpanelof Fig. 1, thereis a gapin theF3 contourtowardthe end of the vowel. Clicking the mouseon eithersideof this gap causesthe programto linearlyinterpolateformantfrequenciesthroughthis gap.It was not uncommonfor utterances to show formantmergersthroughoutall or part of the vocalic nucleusthatcould not be resolvedusingthesemethods.In thesecases,zeros were written into the higher of the two formant slotsshowingthe merger(e.g.,F3 was zeroedout in the caseofanF2-F3 merger).TableI showsthefrequencyof occurreneeof formantmergersfor eachof the 12 vowels.3101d. Acoust. Soc. Am., VoL 97, No. 5, Pt. 1, May 1995the autocorrelationpeak.Any errorsthatremainedwerecorrected using the editing commandsthat were describedabove.B. Results1. Measuroment reliabilitya. Vowel duration.Vowel durations for 10% of the ut-teranceswere remeasuredindependentlyby a secondexperimenter.Theutteranceschosen forremeasurementweredrawnat randomfrom the total of 1668 signals,but withapproximatelyequalnumbersof men,women,andchildren.The averagedabsolutedifferencebetweenthe original andremeasured durations was 6.9 ms. This result is in line withreliabilitydatafor voweldurationreportedby Allen (1978)and Smithet al. (1986).b. Steady-statetimes. Steady-statetimes were measuredby two experimentersfor all 1668 utterances.The average absolutedifferencebetweenthe two measurementswas 21.1 ms, or 7.7% of averagevowel duration.However,moreimportantthanthe time differencebetweenthesetwomeasurementsis the differencein theformantfrequencypattern at these two samplepoints.These results,shown inTablelI, indicatethatformantfrequenciesat the two samplepointstypicallydifferedby roughly1% of averageformantfrequency.c. Formantfrequencies.Two methodswereusedto estimatethereliabilityof theformantfrequencymeasurements.The firstmethodinvolveda simplereanalysisof 10% of theutterancesusingthe ibedpreviously.The secondmethod involved a reanalysisof 10% of the utterancesusingthe samepeakpickingandeditingtechniquesbut with 128-pointcepstrallysmoothedspectrainsteadof LPC spectra.The primarymotivationfor thiscomparisonwas Di Benedetto's(1989a)Hillenbrandet aL: Acousticcharacteristicsof vowels 3101

TABLE IIL Measurement-remeasutementreliability for formantfrequenciesobtainedfrom a randomlyselected10% of the signals.Valuesaregivenas 2.827.42.425.2F323. rtthat LPC producedcomparableestimatesof F2 andF3 but estimatesof F1 thatwere low whencomparedwithsmoothedwidebandFourierspectra.The analysiscarriedoutin the presentstudyconsistedof calculatingFourierspectraover16 ms (256 point)hammingwindowedsegmentsevery8 ms followed by cepstralsmoothing.Cepstralsmoothingwas implementedwith the "smoofi" algorithmfrom Presset al. (1988). The size of the smoothingwindowwas adjustedindividuallyfor eachutteranceto minimize spuriouspeaks or eliminate formant mergers.In this sense,thedegree-of-smoothingparameterperformeda role in the cepstrumanalysiscomparableto thenumberof polesin theLPCanalysis.The wereusedto extractformantfrequenciesfrom thecepstrallysmoothedspectra.Results for the LPC remeasurement are shown in TableIII. The resultsarebasedon a frame-by-framecomparisonofthesignals,excludingfrom considerationany framein whicheithersignalshoweda mergerin theformantslotbeingcompared.Resultsare given as averageabsolutedifferencesandas signed differences.Overall, the absolute differencesrangedfrom about12 to 60 Hz, or between1.0% and2.0%of averageformantfrequency.Table IV comparesformant measurementsobtainedfromLPC andcepstrallysmoothedspectra.Positivenumbersin the signed-differencecolumns indicate that the LPCderivedformantswere higher in frequencythan thosederivedfrom cepstrallysmoothedspectra.In light of Di Bene-deRo's (1989a) findings,the signed differencesare ofparticularinterest.Consistentwith Di Benedetto'sresults,thesigneddifferencesare quite small for formantsaboveF1,especiallyas a percentof formantfrequency.However,unlike Di Benedetto'sfindings,our resultsshowedslightlyhigher first formantsfrom LPC spectra.This discrepancymight be due to differencesbetweenthe cepstralsmoothingmethodusedin thepresentstudyandthe "pseudospectrum"methodusedby Di Benedetto.However, it shouldbe notedthat Di Benedetto'sfindingswere basedon analysesof lF1-3.112.2Children11.7-2.63.4two experimenterswho madethesejudgments.The averagesshownin the lable,andthe datadisplayedin the subsequentfigures,are basedon measurementsfrom individualtokensthat were well identifiedin the listeningstudy,to be describedin the next section.Specifically,for the purposesofthesecalculations,measurementswere not included from in-dividualtokensthat producedan identificationerror rate of15% or greater,where"error" simplymeansany instanceinwhich a signalwas identifiedas a vowel otherthan that intendedby the talker.Usingthiscriterion,theaveragesin thistableare basedon measurementsfrom 88.5% of the signals.This allows an analysisof measurementsfor signalsforwhich the talkersand listenersare in goodagreementaboutthe vowel that was spoken.In general,the removalof themore ambiguoussignalshad very little effect on the averages,with the importantexceptionof/ /. As will be discussedin the next section, there were several instancesofattemptsat/a/that werepoorlyidentifiedand,in somecases,consistentlyidentifiedas/o/.a. Vowelduration. The patternof durationaldifferencesamongthe vowels is very similar to that observedin connectedspeech.Our vowel durationsfrom/hVd/syllables aretwo-thirdslongerthanthosemeasuredin connectedspeechby Black(1949), but correlatestrongly(r 0.91) with 33] 9.04,p 0.001). rterdurationsfor themenwhencomparedto eitherthewomenor thechildren.Longerdurationsfor thechildrenwere expectedbasedon numerousdevelopmentalstudies(e.g., Smith,1978;Kent and Forner,1980) but the differencesbetweenthe men and the womenwere not expected.We do not have an explanationfor this findingand do notknow ff these male-femaleduration differences would alsobe seenin conversationalspeechsamples.b. Fundamentalfrequency. Figure2 comparesour averagevaluesof fundamentalfrequencywith thoseof PB forances spoken by just two men and one woman.d. Fundamentalfrequency. Remeasurementof fundamentalfrequencyc

Acoustic characteristics of American English vowels James Hillenbrand, Laura A. Getty, Michael J. Clark, and Kimberlee Wheeler Department oj e Speech Pathology and Audiology, Western Michigan University, Kalamazoo, Michigan 49008

Related Documents:

Introduction 1 An Introduction to Acoustic Emission—/?. B. Liptai, D. O. Harris, and C. A. Tatro 3 Research on the Sources and Characteristics of Acoustic Emission—fi. H. Schofield 11 Dislocation Motions and Acoustic Emissions—P. P. Gillis 20 Acoustic Emission Testing and Microcracking Processes—y4. S. Tetelman and R. Chow 30

Welcome to Variax Acoustic Thanks for buying a Variax Acoustic and joining us in our quest to apply the miracle of modern technology to the pursuit of great acoustic guitar tone. You now own detailed models of some of the most distinctive acoustic instruments of all time–wrapped up in a single comfortable and highly playable guitar. How does .

Play Acoustic – Reference manual (2014-05-09) 7 Welcome to the Play Acoustic manual! First, thank you so much for purchasing Play Acoustic. We at TC-Helicon are confident that your vocal and acoustic guitar performances will be positively impacted with this great effects processor. As you discovered in the Quick Start Guide (the

TECHNOLOGY RISK REDUCTION Developing next-generation acoustic core Improve acoustic performance through unique non-conventional geometries Large acoustic cell configuration in development Producible/Cost-effective large acoustic cavity configuration Producible design concepts for acoustic testing “On Hold” until 2021 . Large .

AT, an Acoustic Transmissometer Albert J. Williams 3rd Woods Hole Oceanographic Institution MS#12, 98 Water St. Woods Hole, MA 02543 USA Abstract-The combination of attenuation measurement with acoustic travel-time current measurement along a common path has produced a new acoustic sensor of suspended particles, the Acoustic Transmissometer (AT).

As a result of this physical nature of acoustic waves, the composition of the material through which an acoustic wave travels will impact its speed and the energy that is lost due to absorption as the wave propagates through the mate-rial. When a propagating acoustic wave encounters a sudden change in the acoustic impedance (product of sound speed

A.4. Performance analysis - Consideration of variable amplitude acoustic emission sources Only the detectability of an acoustic emission source equivalent to a Hsu-Nielsen source (0.5 mm - 2H) was considered in the previous calculations. It can be assumed that detectable acoustic emission sources in a real structure do not necessarily give

Albert woodfox Arthur Kinoy Award A Message from NYU PILC At every NLG #Law4thePeople Convention, we honor members and friends of the Guild whose exemplary work and activism capture the spirit of “law for the people,” and speak to the Guild’s philosophy of human rights over property interests. Please join us in congratulating our 2016 honorees! Workshops Tentative Schedule Felon .