Audio Fundamentals - Sharif

3y ago
52 Views
10 Downloads
2.58 MB
221 Pages
Last View : 18d ago
Last Download : 3m ago
Upload by : Nixon Dill
Transcription

Audio Fundamentals

Audio Fundamentals Acoustics is the study of soundGeneration, transmission, and reception of sound wavesSound wave - energy causes disturbance in a medium Example is striking a drumHead of drum vibrates disturbs air molecules close toheadRegions of molecules with pressure above and belowequilibriumSound transmitted by molecules bumping into each otherMultimedia Systems2

Sound Wavescompressionrarefactionamplitudesin waveMultimedia Systemstime3

Sending/Receiving ReceiverA microphone placed in sound field moves accordingto pressures exerted on itTransducer transforms energy to a different form (e.g.,electrical energy) SendingA speaker transforms electrical energy to sound wavesMultimedia Systems4

Signal Fundamentals Pressure changes can be periodic or aperiodic Periodic vibrationscycle - time for compression/rarefactioncycles/second - frequency measured in hertz (Hz)period - time for cycle to occur (1/frequency) Frequency rangesbarametric pression is 10-6 Hzcosmic rays are 1022 Hzhuman perception [0, 20kHz]Multimedia Systems5

Wave Lengths Wave length is distance sound travels in onecycle20 Hz is 56 feet20 kHz is 0.7 inch Bandwidth is frequency range Transducers cannot linearly produce humanperceived bandwidthFrequency range is limited to [20 Hz, 20 kHz]Frequency response is not flatMultimedia Systems6

Measures of Sound Sound level is a logarithmic scaleSPL 10 log (pressure/reference) decibels (dB)where reference is 2*10-4 dyne/cm20 dB SPL - essentially no sound heard35 dB SPL - quiet home70 dB SPL - noisy street120 dB SPL - discomfortMultimedia Systems7

Sound Phenomena Sound is typically a combination of wavesSin wave is fundamental frequencyOther waves added to it to create richer soundsMusical instruments typically have fundamentalfrequency plus overtones at integer multiples of thefundamental frequency Waveforms out of phase cause interference Other phenomenaSound reflects off walls if small wave lengthSound bends around walls if large wave lengthsSound changes direction due to temperature shiftsMultimedia Systems8

Human Perception Speech is a complex waveformVowels and bass sounds are low frequenciesConsonants are high frequencies Humans most sensitive to low frequenciesMost important region is 2 kHz to 4 kHz Hearing dependent on room and environment Sounds masked by overlapping soundsMultimedia Systems9

Critical Bandsspecificloudness2221.4321timeE L E C TROA C O U S TICS200 msMultimedia Systems10

Sound Fieldsamplitudedirected soundearly reflections(50-80 msec)reverberationtimeMultimedia Systems11

amplitudeImpulse ResponseConcert HallamplitudetimeHometimeMultimedia Systems12

Audio Noise MaskingamplitudeStrong TonalSignalMaskedRegiontimeMultimedia Systems13

QuantizationAudio SamplingTimeMultimedia Systems14

Audio RepresentationsOptimal sampling frequency is twice the highestfrequency to be sampled (Nyquist Theorem)F orm atT e le p h o n yT e le c o n f e r e n c in gC o m p a c t D is kD ig ita l A u d io T a p eMultimedia SystemsS a m p lin g R a te8 kH z16 kH z4 4 .1 k H z48 kH zB a n d w id th3 .2 k H z7 kH z20 kH z20 kH zF req u en cy B an d2 0 0 -3 4 0 0 H z5 0 -7 0 0 0 H z2 0 -2 0 ,0 0 0 H z2 0 -2 0 ,0 0 0 H z15

Jargons/Standards Emerging standard formats8 kHz 8-bit U-LAW mono22 kHz 8-bit unsigned linear mono and stereo44 kHz 16-bit signed mono and stereo48 kHz 16-bit signed mono and stereo Actual standardsG.711 - A-LAW/U-LAW encodings (8 bits/sample)G.721 - ADPCM (32 kbs, 4 bits/sample)G.723 - ADPCM (24 kbs and 40 kbs, 8 bits/sample)G.728 - CELP (16 kbs)GSM 06.10 - 8 kHz, 13 kbs (used in Europe)LPC (FIPS-1015) - Linear Predictive Coding (2.4kbs)CELP (FIPS-1016) - Code excitied LPC (4.8kbs, 4bits/sample)G.729 - CS-ACELP (8kbs)MPEG1/MPEG2, AC3 - (16-384kbs) mono, stereo, and 5 1 channelsMultimedia Systems16

Audio Packets and Data Rates Telephone uses 8 kHz samplingATM uses 48 byte packetsRTP uses 160 byte packets6 msecs per packet20 msecs per packet Need many other data rates30 kbs audio over 28.8 kbs modems32 kbs good stereo audio is possible56 kbs or 64 kbs conventional telephones128 kbs MPEG1 audio256 - 384 kbs higher quality MPEG/AC3 audioMultimedia Systems17

Discussion Higher qualityFilter inputMore bits per sample (i.e. 10, 12, 16, etc.)More channels (e.g. stereo, quadraphonic, etc.) Digital processingReshape impulse response to simulate a different roomMove perceived location from which sound comesLocate speaker in 3D space using microphone arraysCover missing samplesMix multiple signals (i.e. conference)Echo cancellationMultimedia Systems18

Interactive Time Constraints Maximum time to hear own voice: 100 msec Maximum round-trip time: 300 msecMultimedia Systems19

Importance of Sound Passive viewing (e.g. film, video, etc.)Very sensitive to sound breaksVisual channel more important (ask film makers!)Tolerate occasional frame drops Video conferencingSound channel is more importantVisual channel still conveys informationSome people report that video teleconference users turn offvideoNeed to create 3D space and locate remote participants initMultimedia Systems20

Producing High Quality Audio Eliminate background noiseDirectional microphone gives more controlDeaden the room in which you are recordingSome audio systems will cancel wind noise One microphone per speaker Keep the sound levels balanced Sweeten sound track with interesting soundeffectsMultimedia Systems21

Audio -vs- Video Some people argue that sound is easy andvideo is hard because data rates are lowerNot truedifferent!audio is every bit as hard as video, just Computer Scientists will learn about audioand video just as we learned about printingwith the introduction of desktop publishingMultimedia Systems22

Audio Some techniques for audio compression:ADPCMLPCCELPMultimedia Systems23

Digital Audio for Transmission and StorageTarget Bit Rates for MPEG Audio and Dolby AC-3DAB, DVB and DVDMPEG-Audio Layer II, Dolby AC-3Dolby ACAC-3MultiChannelMultiDual ChannelMPEGMPEG-2 Audio "LSF"Dual ChannelMPEGMPEG-2 Audio LIIMultiMulti-ChannelMPEGMPEG-1 Audio with 32, 44.1 and 48 kHzCCITTG722198603264Layer III96128160192224256Here, we still have problems ! ! !Layer IINICAM1982320!Layer I384 kbit/s4481066Bit RatePossible candidates to solve these problems:- MPEG-2 AAC and MPEG-4 Audio- Internet Radio Audio Package ManufacturersMultimedia Systems24

History of MPEG-Audio MPEG-1 Two-Channel coding standard (Nov. 1992) MPEG-2 Extension towards Lower-SamplingFrequency (LSF) (1994) MPEG-2 Backwards compatible multi-channel coding(1994) MPEG-2 Higher Quality multi-channel standard(MPEG-2 AAC) (1997) MPEG-4 Audio Coding and Added Functionalities(1999, 2000)Multimedia Systems25

Audio ADPCM -- Adaptive Differential Pulse CodeModulationADPCM allows for the compression of PCM encoded inputwhose power varies with time.Feedback of a reconstructed version of the input signal issubtracted from the actual input signal, which is quantised to givea 4 bits output value.This compression gives a 32 kbit/s output rate.Multimedia Systems26

AudioTransmitterOriginalXmEm Em*QunatizerCoderChannel Xm'Xm*Predictor ReceiverChannelReconstructedEm*Decoder Xm* Xm'Multimedia SystemsPredictor27

Audio LPC -- Linear Predictive CodingThe encoder fits speech to a simple, analytic model of the vocal tract.Only the parameters describing the best-fit model is transmitted tothe decoder.An LPC decoder uses those parameters to generate synthetic speechthat is usually very similar to the original.LPC is used to compress audio at 16 Kbit/s and below.Multimedia Systems28

Audio -- CELP CELP -- Code Excited Linear PredictorCELP does the same LPC modeling but then computers the errorsbetween the original speech and the synthetic model and transmitsboth model parameters and a very compressed representation of theerrors.The result of CELP is a much higher quality speech at low data rate.Multimedia Systems29

Digital Audio Recapture Digital audio parametersSampling rateNumber of bits per sampleNumber of channels (1 for mono, 2 for stereo, etc.) Sampling rateUnit -- Hz or sample per secondSampling rate is measured per channelFor stereo sound, if the sampling rate is 8KHz, thatmeans 16K samples will be obtained per secondMultimedia Systems30

Sampling Rate & ApplicationsSampling Rate8KHz11.025KHz22KHz32 KHz44.1 KHz48 KHz Higher sampling rateMultimedia SystemsApplicationsTelephony standardWeb applicationsMac sampling rateDigital radioCD quality audioDAT (Digital Audio Tape)better qualitylarger file31

Speech Compression Speech compression technologiesSilence suppression – detect the “silence”, only code the “laud” part of the speech(currently a technique combined with other methods to increase the compressionratio)Differential PCM – a simple methodUtilize the speech modelLinear Predictive Coding (LPC): fits signal to speech model and transmitsparameters of modelCode Excited Linear Predictor (CELP): Same principle as LPC, but instead oftransmitting parameters, transmit error terms in codebookQuality of compresses audioLPC -- Computer talking alikeCELP – Better quality, audio conferenceMultimedia Systems32

Audio compression Audio vs. SpeechHigher quality requirement for audioWider frequency range of audio Psychoacoustics modelResult of ear sensitivity test*Sensitivity of human earsMost sensitive at (2 kHz, 5kHz)Multimedia Systems33

Principle of Audio Compression (1) Psychoacoustics model (cont.)Frequency masking – when multiple signal present, astrong signal may “mask” other signals at nearby9kHz 40, dBfrequenciesFrequency masking at different tones (60 dB) *Thinking: if there is a 8 kHz signal at 60 dB, can we hear another 9 kHzsignal at 40 dB?Multimedia Systems34

Principle of Audio Compression (2) Psychoacoustics model (cont.)Critical bandwidth – the range of frequencies that are affected by themasking tone beyond a certain degreeCritical bandwidth increases with the frequency of masking toneFor masking frequency less than 500 Hz, critical bandwidth is around100 Hz; for frequency greater than 500 Hz, the critical bandwidthincrease linearly in a multiple of 100 HzTemporal masking -- If we hear a loud sound, then it stops, it takes alittle while until we can hear a soft tone nearbyMask tone: 1 KHzTest tone: 1.1 kHzMultimedia Systems35

Principle of Audio Compression (3) Audio compression – Perceptual codingTake advantage of psychoacoustics modelDistinguish between the signal of different sensitivity to human earsSignal of high sensitivity – more bits allocated for codingSignal of low sensitivity – less bits allocated for codingExploit the frequency maskingDon’t encode the masked signal (range of masking is 1 critical band)Exploit the temporal maskingDon’t encode the masked signal Audio coding standard – MPEG audio codecHave three layers, same compression principleMultimedia Systems36

MPEG Audio Codec (1) Basic facts about MPEG audio codingPerceptual codingSupport 3 sampling rates32 kHz – Broadcast communication44.1 kHz – CD quality audio48 KHz – Professional sound equipmentSupports one or two audio channels in one of the following fourmodes:Monophonic -- single audio channelDual-monophonic -- two independent channels (similar to stereo)Stereo -- for stereo channels that share bitsJoint-stereo -- takes advantage of the correlations between stereochannelsMultimedia Systems37

MPEG Audio Codec (2) Procedure of MPEG audio codingApply DFT (Discrete Fourier Transform)decomposes the audiointo frequency subbands that approximate the 32 critical bands(sub-band filtering)Use psychoacoustics model in bit allocationIf the amplitude of signal in band is below the masking threshold,don’t encodeOtherwise, allocate bits based on the sensitivity of the signalMultiplex the output of the 32 bands into one bitstreamAudioAnalysisFilterBank(DFT)Multimedia SystemsPsychoacoustics modelAnalysisQ1 Q 32FormatFrameForTransmission38

MPEG Audio Codec (3) MPEG audio frame formatAudio data is divided into frames, each frame contains384 samplesAfter subband filtering, each frame (384) isdecomposed into 32 bands, each band has 12 samplesThe bitstream format of the output MPEG audio is:Header SBS format12x32 subband samplesAncillary Data(SBS)The minimum encoding delay is determined by theframe size and the number of frames accumulated forMultimediaSystems39encoding

MPEG Audio Codec (4) MPEG layersMPEG defines 3 layers for audio. Basic compression model issame, but codec complexity increases with each layerThe popular MP3 is MPEG audio codec layer 3Layer 1:DCT type filter apply to one frameEqual frequency spread per bandFrequency masking onlyLayer 2:Use three frames in filter (previous, current, next)Both frequency and temporal masking.Layer 3:Better critical band filter is used (non-equal frequencies)Better psychoacoustics modelTakes into account stereo redundancy, and uses Huffman coder.Multimedia Systems40

Perceptual CodingofAudio Signals – A Tutorial

What is Coding for? Coding, in the sense used here, is the process ofreducing the bit rate of a digital signal. The coder input is a digital signal. The coder output is a smaller (lower rate) digitalsignal. The decoder reverses the process and provides (anapproximation to) the original digital signal.Multimedia Systems42

Historical Coder “Divisions”: Lossless Coders vs. Lossy Coders Or Numerical Coders vs. Source CodersMultimedia Systems43

Lossless Coding: Lossless Coding commonly refers to codingmethods that are completely reversible, i.e.coders wherein the original signal can bereconstructed bit for bit.Multimedia Systems44

Lossy Coding: Lossy coding commonly refers to coders thatcreate an approximate reproduction of theirinput signal. The nature of the loss dependsentirely on the kind of lossy coding used.Multimedia Systems45

Source Coding: Source Coding can be either lossless orlossy. In most cases, source coders are deliberatelylossy coders, however, this is not a restrictionon the method of source coding. Sourcecoders of a non-lossy nature have beenproposed for some purposes.Multimedia Systems46

Source Coding: Removes redundancies through estimating amodel of the source generation mechanism.This model may be explicit, as in an LPCspeech model, or mathematical in nature,such as the ”transform gain” that occurs whena transform or filterbank diagonalizes thesignal.Multimedia Systems47

Source Coding: Typically, the source coder users the sourcemodel to increase the SNR or reduce another error metric of the signal by theappropriate use of signal models andmathematical redundancies.Multimedia Systems48

Typical Source Coding Methods: LPC analysis (includingdpcm and its derivativesand enhancements) Multipulse Analysis bySynthesis Sub-band Coding Transform Coding Vector QuantizationThis list is not exhaustiveMultimedia Systems49

Well Known Source CodingAlgorithms: Delta Modulation G728 DCPM LDCELP ADPCM LPC-10E G721Multimedia Systems50

Numerical Coding: Numerical coding is a almost always alossless type of coding. Numerical coding, inits typical usage, means a coding method thatuses abstract numerical methods to removeredundancies from the coded data. New Lossy Numerical coders can providefine-grain bit rate scalability.Multimedia Systems51

Common Numerical CodingTechniques: Huffman Coding Arithmetic Coding Ziv-Lempel (LZW) Coding This list is not exhaustiveMultimedia Systems52

Numerical Coding (cont.): Typically, numerical coders use “entropy coding” based methods toreduce the actual bit rate of the signal. Source coders most often use signal models to reduce the signalredundancy, and produce lossy coding systems. Both methods work by considering the source behavior. Both methods attempt to reduce the Redundancy of the originalsignal.Multimedia Systems53

Perceptual Coding: Perceptual coding uses a model of thedestination, i.e. the human being who will beusing the data, rather than a model of thesignal source. Perceptual coding attempts to remove partsof the signal that the human cannot perceive.Multimedia Systems54

Perceptual Coding (cont.): Is a lossy coding method. The imperceptible information removed by theperceptual coder is called the irrelevancy of the signal. In practice, most perceptual coders attempt to removeboth irrelevancy and redundancy in order to make acoder that provides the lowest bit rate possible for a giveaudible quality.Multimedia Systems55

Perceptual Coding (cont.): Perceptual coders will, in general, have alower SNR than a source coder, and a higherperceived quality than a source coder ofequivalent bit rate.Multimedia Systems56

Perceptual Audio CoderBlock Diagram a SystemsFiltered(diagonalized)Audio SignalQuantizationand alues, odedBitstream57

Auditory Masking Phenomena: The “Perceptual Model”Multimedia Systems58

What is Auditory Masking: The Human Auditory System (HAS) has alimited detection ability when a strongersignal occurs near (in frequency and time) toa weaker signal. In many situations, theweaker signal is imperceptible even underideal listening conditions. Auditory Masking Phenomena (cont.)Multimedia Systems59

First Observation of Masking:Tone Masker If we compare: Tone Masker to Tone Masker plus noise The energy of the 1-bark wideprobe is 15.0 dB below theenergy of the tone masker.NoiseProbeTHE NOISE IS AUDIBLEAuditory Masking Phenomena (cont.)Multimedia Systems60

The Noise is NOT Masked! In this example, a masker to probe ratio ofapproximately 25 dB will result in completemasking of the probe.Auditory Masking Phenomena (cont.)Multimedia Systems61

2nd Demonstration of Masking:Noise Masker If we compare: Noise Masker to Noise Masker plus toneprobe ToneProbeThe energy of the 1-bark widemasker is 15 dB above the toneprobe.The Tone is NOT AudibleAuditory Masking Phenomena (cont.)Multimedia Systems62

The Tone is COMPLETELY Masked In this case, a masker to probe ratio ofapproximately 5.5 dB will result in completemasking of the tone.Auditory Masking Phenomena (cont.)Multimedia Systems63

Auditory Masking Phenomena (cont.): There is an asymmetry in the masking abilityof a tone and narrow-band noise, when thatnoise is within one critical band. This asymmetry is related to the short-termstability of the signal in a given criticalbandwidth.Multimedia Systems64

Critical Bandwidth? What’s this about a critical bandwidth? A critical bandwidth dates back to the experiments ofHarvey Fletcher. The term critical bandwidth was coinedlater. Other people may refer to the “ERB” or equivelentrectangular bandwidth. They are all manifestations of thesame thing. What is that?Auditory Masking Phenomena (cont.)Multimedia Systems65

A critical band or critical bandwidth is a range of frequencies over which themasking SNR remains more or less constant. For example, in the demonstration, any noise signal within - .5critical band of the tone will produce nearly the same maskingbehavior as any other, as long as their energies are the same.Auditory Masking Phenomena (cont.)Multimedia Systems66

Auditory Filterbank: The mechanical mechanism in the humancochlea constitute a mechanical filterbank.The shape of the filter at any one position onthe cochlea is called the cochlear filter forthat point on the cochlea. A critical band isvery close to the passband bandwidth of thatfilter.Auditory Masking Phenomena (cont.)Multimedia Systems67

ERB A newer take on the bandwidth of auditoryfilters is the “Equivalent RectangularBandwidth”. It results in filters slightlynarrower at low frequencies, and su

Multimedia Systems 24 Digital Audio for Transmission and Storage Target Bit Rates for MPEG Audio and Dolby AC-3 0 64 128 192 256 320 384 MPEG-1 Audio with 32, 44.1 and 48 kHz

Related Documents:

765 S MEDIA TECHNOLOGY Designation Properties Page Audio Audio cables with braided shielding 766 Audio Audio cables, multicore with braided shielding 767 Audio Audio cables with foil shielding, single pair 768 Audio Audio cables, multipaired with foil shielding 769 Audio Audio cables, multipaired, spirally screened pairs and overall braided shielding 770 Audio Digital audio cables AES/EBU .

Sharif Medical & Dental College Lahore, Paksitan. Editorial Correspondence should be addressed to Editor JSMDC Sharif Medical & Dental College Sharif Medical City Road, Off Raiwind Road, Jati Umra, Lahore 54000, Pakistan. Tel ( 92-42) 111-123-786 Fax ( 92-42)37860122 E-mail editors@jsmdc.pk Website www.jsmdc.pk Printed by ALIGARH Publisher

Activities of Omar Sharif (1932-2015) (123 resources in data.bnf.fr) Mixed works (12) "Terreur sur le Britannic" (2013) of Richard Lester et al. with Omar Sharif .

Connect to the audio connector of a camera if the camera supports audio recording. Note: To make a video backup with audio, make sure the camera which supports the audio function is connected to the video-in channel and audio-in channel. For example, the audio data from audio CH1 will be recorded with the video data from video CH1. 3) AUDIO OUT

To configure 4/5.1/7.1-channel audio, you have to retask either the Line in or Mic in jack to be Side speaker out through the audio driver. 1-1 Configuring 2/4/5.1/7.1-Channel Audio The motherboard provides five audio jacks on the back panel which support 2/4/5.1/7.1-channel (Note) audio. The picture to the right shows the default audio jack

Connect to a Stereo Audio system HDMI supports audio through your TV. However, you might prefer to connect Xbox 360 audio to a different audio system, such as a stereo receiver or amplifier. To do so, connect the HDMI AV cable for video and the included audio adapter cable for audio. To connect to a stereo audio system: 1 Connect the HDMI AV .

further prior to the receiver "brick wall"audio filter Band Pass filter Band Pass filter Tayloe Quadrature Detector Tayloe Quadrature Detector "I"Audio Preamp "I"Audio Preamp RC Active Audio Filter RC Active Audio Filter VFOVFO 300 -500 KHz BW 1.5 KHz BW 1.5 KHz BW 500 Hz BW (CW) "Q"Audio Preamp "Q"Audio Preamp 90 Degree .

Iowa, 348 P. Sharma, O. P. (1986) Textbook of algae. Tata Mcgrawhill Publishing company Ltd. New Delhi. 396. p. UNESCO (1978) Phytoplankton manual. Unesco, Paris. 337 p. Table 1: Relative abundance of dominant phytoplankton species in water sarnples and stomach/gut of bonga from Parrot Island. Sample Water date 15/1/04 LT (4, 360 cells) Diatom 99.2%, Skeletonema costatum-97.3% HT (12, 152 .