Audio Coding Standards - Mp3

1y ago
5 Views
1 Downloads
536.40 KB
27 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Elise Ammons
Transcription

Chi-Min Liu and Wen-Whei Chang, 1999AUDIO CODING STANDARDSChi-Min LiuDepartment of Computer Science and Information EngineeringNational Chiao Tung University, TaiwanWen-Whei ChangDepartment of Communication EngineeringNational Chiao Tung University, Taiwan1. INTRODUCTION . 12. ISO/MPEG AUDIO CODING STANDARDS . 33. OTHER AUDIO CODING STANDARDS . 104. ARCHITECTURAL OVERVIEW . 105. CONCLUSIONS . 181. INTRODUCTIONWith the introduction of compact disc (CD) in 1982, the digital audio media has quicklyreplaced the analog audio media. However, a significant amount of uncompressed data (1.41million bits per second) required for the digital audio has led to a large transmission andstorage burden. The advances of audio coding techniques and the resultant standards havegreatly eased the burden. Ten years ago, nearly nobody believed that 90% of the audio data1

Chi-Min Liu and Wen-Whei Chang, 1999could be deleted without affecting audio fidelity. Nowadays, the fantasy becomes reality andthe on-going coding technologies are inspiring new dreams. This chapter reviews someinternational and commercial product audio coding standards, including ISO/MPEG family[ISO, 1992][ISO, 1994][ISO, 1997][ISO, 1999], the Philips PASC [Lokhoff, 1992], the SonyATRAC [Tsutsui, 1992], and the Dolby AC-3 [Todd, 1994] algorithm.“Audio Coding Standards,” A chapter for the book“Handbook of Multimedia Communication,” to appearin a book by Academic Press, 20002

Chi-Min Liu and Wen-Whei Chang, 19992. ISO/MPEG AUDIO CODING STANDARDSThe Moving Pictures Experts Group (MPEG) within the International Organization forStandardization (ISO) has developed a series of audio coding standards for storage andtransmission of various digital media. The ISO standard specifies a syntax for only the codedbit-streams and the decoding process; sufficient flexibility is allowed for encoderimplementation. The MPEG first-phase (MPEG-1) audio coder operates in single-channel ortwo-channel stereo mode at sampling rates of 32, 44.1, and 48 kHz. In the second phase ofdevelopment, particular emphasis is placed on the multichannel audio support and on anextension of the MPEG-1 to lower sampling rates and lower bit rates. MPEG-2 audio consistsof mainly two coding standards: MPEG-2 BC [ISO, 1994] and MPEG-2 AAC [ISO, 1997].Unlike MPEG-2 BC, which is constrained by its backward compatibility (BC) with MPEG-1format, MPEG-2 AAC (Advanced Audio Coding) is unconstrained and can therefore providebetter coding efficiency. The most recent development is the adoption of MPEG-4 [ISO, 1999]for very-low-bit-rate channels, such as those found in Internet and mobile applications. Table1 lists the configuration used in MPEG audio coding standards.3

Chi-Min Liu and Wen-Whei Chang, 1999StandardsAudio samplingCompressed bit-raterate (kHz)(kbits/sec)MPEG-1 Layer I32, 44.1, 4832 – 4481-2 channels1992MPEG-1 Layer II32, 44.1, 4832 – 3841-2 channels1992MPEG-1 Layer III32, 44.1, 4832 – 3201-2 channels199332, 44.1, 4832 – 448 for two BC1-5.1 channels19941-5.1 channels19941-5.1 channels19941-48 channels19971-48 channels1999MPEG-2 Layer IChannelsStandardApprovedchannels16, 22.05, 2432 – 256 for two BCchannels32, 44.1, 48MPEG-2 Layer II32 – 384 for two BCchannels16, 22.05, 248 – 160 for two BCchannelsMPEG-2 Layer III32, 44.1, 4832 – 384 for two BCchannels16, 22.05, 248 – 160 for two BCchannelsMPEG-2 AAC8, 11.025, 12, 16,Indicated by a 23-bit22.05, 24, 32, 44.1,unsigned integer48, 64, 88.2, 96MPEG-4 T/F coding 8, 11.025, 12, 16,22.05, 24, 32, 44.1,Indicated by a 23-bitunsigned integer48, 64, 88.2, 96Table 1 Comparison of ISO/MPEG audio coding standards2.1 MPEG-1The MPEG-1 standard consists of three layers of audio coding schemes with increasingcomplexity and subjective performance. These layers were developed in collaboration mainlywith AT&T, CCETT, FhG/University of Erlangen, Philips, IRT, and Thomson ConsumerElectronics. MPEG-1 operates in one of four possible modes: mono, stereo, dual channel, and4

Chi-Min Liu and Wen-Whei Chang, 1999joint stereo. With a joint stereo mode, further compression can be realized through someintelligent exploitation of either the correlation between the left and right channels or theirrelevancy of the phase difference between them.2.1.1 MPEG-1 Layers I and IIBlock diagrams of Layer I and Layer II encoders are given in Fig. 1. An analysis filterbanksplits the input signal with sampling rate Fs by dividing it into 32 equally spaced subbandsignals with sampling rate Fs/32. In each of the 32 subbands, 12 consecutive samples areassembled into blocks with the equivalent of 384 input samples. All of the samples within oneblock are normalized by a scale factor so that they all have absolute values less than one. Thechoice of a scale factor is done by first finding the sample with the maximum absolute value,and then comparing it to a scale factor table of 63 allowable values. After normalization,samples are quantized and coded under the control of a psychoacoustic model. Detailedpsychoacoustic analysis is performed through the use of a 512 (Layer I) or 1024 (Layer II)point FFT in parallel with the subband decomposition. The bit-allocation unit determines thequantizer resolution according to the targeted bit rate and the perceptual information derivedfrom the psychoacoustic model. Layer II introduces further compression with respect to LayerI through three modifications. First, the overall information is reduced by removingredundancy and irrelevance between the scale factors of three adjacent 12-sample blocks.Second, a quantization table with improved precision is provided. Third, the psychoacousticanalysis benefits from better frequency resolution because of the increased FFT n5PackingPacking

Chi-Min Liu and Wen-Whei Chang, 1999Fig. 1. MPEG-1 Layer I or II audio encoder.2.1.2 MPEG-1 Layer IIIAudioSignalsOutputBit oirReservoirFig. 2. MPEG-1 Layer III audio encoder.The MPEG-1 Layer III audio coder introduces many new features, in particular a hybridfilterbank which is a cascade of two filterbanks. For notational convenience, the firstfilterbank is labeled as the Layer III 1st hybrid level and the second as the Layer III 2nd hybridlevel . A block diagram of the Layer III encoder is given in Fig. 2. Although its 1st level isbased on the same filterbank found in the other Layers, Layer III provides a higher frequencyresolution by subdividing each of the 32 subbands with an 18-point modified discrete cosinetransform (MDCT). Furthermore, the transform block size adapts to signal characteristics toensure dynamic tradeoffs between time and frequency resolution. It also employs nonuniformquantization in conjunction with variable length coding for further savings in bit rates. Onespecial feature of Layer III is the bit reservoir; it provides the vehicle to better fit the encoder'stime-varying demand on code bits. The encoder can donate bits to a reservoir when it needsless than the average number of bits to code the samples in a frame. But in case the audiosignals are hard to compress, the encoder can borrow bits from the reservoir to improve thefidelity.6

Chi-Min Liu and Wen-Whei Chang, 19992.2 MPEG-2MPEG-2 differs from MPEG-1 in that it supports up to 5.1 channels, including fivefull-bandwidth channels of the 3/2 stereo, plus an optional low-frequency enhancementchannel. This multichannel extension leads to an improved realism of auditory ambience notonly for audio-only applications, but also for high-definition television (HDTV) and digitalversatile disc (DVD). In addition, initial sampling rates can be extended downward to include16, 22.05, and 24 kHz. Two coding standards within MPEG-2 are defined: the BC (BackwardCompatible) standard preserves the backward compatibility with MPEG-1, and the AAC(Advanced Audio Coding) standard does not.2.2.1 MPEG-2 BCRegarding syntax and semantics, the differences between MPEG-1 and MPEG-2 BC areminor, except in the latter case for the new definition of a sampling frequency field, a bit rateindex field, and a psychoacoustic model used in bit allocation tables. In addition, parametersof MPEG-2 BC have to be changed accordingly. With the extension of lower sampling rates,it is possible to compress two-channel audio signals to bit rates less than 64 kb/s with goodquality. Backward compatibility implies that existing MPEG-1 audio decoders can deliver twomain channels of the MPEG-2 BC coded bitstream. This is achieved by coding the left andright channels as MPEG-1, while the remaining channels are coded as ancillary data in theMPEG-1 bitstream.7

Chi-Min Liu and Wen-Whei Chang, 19992.2.2 MPEG-2 gBitBitAllocationAllocationBit ReservoirBit ReservoirFig. 3. MPEG-2 AAC audio encoderMPEG-2 ACC provides the highest quality for applications where backwardcompatibility with MPEG-1 is not a constraint. While MPEG-2 BC provides good audioquality at data rates of 640-896 kb/s for five full-bandwidth channels, MPEG-2 AAC providesvery good quality at less than half of that data rate. Block diagram of an AAC encoder isgiven in Fig. 3. The gain control tool splits the input signal into four equally spaced frequencybands, which are then flexibly encoded to fit into a variety of sampling rates. The pre-echoeffect can also be alleviated through the use of the gain control tool. The filterbank transformsthe signals from the time domain to the frequency domain. The temporal noise shaping (TNS)tool helps to control the temporal shape of the quantization noise. Intensity coding and thecoupling reduce perceptually irrelevant information by combining multiple channels inhigh-frequency regions into a single channel. The prediction tool further removes theredundancies between adjacent frames. M/S coding removes stereo redundancy based oncoding the sum and difference signal instead of the left and right channels. Other units,including quantization, variable length coding, psychoacoustic model, and bit allocation, aresimilar to those used in MPEG Layer III.MPEG-2 AAC offers flexibility for different quality-complexity tradeoffs by definingthree profiles: the main profile, the low-complexity profile, and the sampling rate scalable8

Chi-Min Liu and Wen-Whei Chang, 1999(SRS) profile. Each profile builds on some combinations of different tools as listed in Table 2.The main profile yields the highest coding efficiency by incorporating all the tools with theexception of the gain control tool. The low complexity profile is used for applications wherememory and computing power are constrained. The SRS profile offers a scalable complexityby allowing partial decoding of a reduced audio bandwidth.2.3 MPEG-4The MPEG-4 standard, which was finalized in 1999, integrates the whole range of audiofrom high-fidelity speech coding and audio coding down to synthetic speech and synth audio.The MPEG-2 ACC tool set within the MPEG-4 standard supports the compression of naturalaudio at bit rates ranging from 2 up to 64 kb/s. The MPEG-4 standard defines three types ofcoders: parametric coding, code-excited linear predictive (CELP) coding, and ngth Decoding999Inverse 88TNS9LimitedLimitedFilterbank999Gain Control889Table 2 Coding tools used in MPEG-2 AAC(T/F) coding. For speech signals sampled at 8 kHz, parametric coding is used to achievetargeted bit rates between about 2 and 6 kb/s. For audio signals sampled at 8 and 16 kHz,CELP coding offers good quality at medium bit rates between about 6 and 24 kb/s.T/F coding is typically applied to the bit rates starting at about 16 kb/s for audio signals9

Chi-Min Liu and Wen-Whei Chang, 1999with bandwidths above 8 kHz. T/F coding is developed based on the coding tools used inMPEG-2 AAC with some add-ons. One is referred to as the twin-VQ (vector quantization),which makes combined use of an interleaved VQ and LPC (Linear Predictive Coding)spectral estimation. In addition, the introduction of bit-sliced arithmetic coding (BSAC) offersnoiseless transcoding of an AAC stream into a fine granule scalable stream between 16 and 64kb/s per channel. BSAC enables the decoder to stop anywhere between 16 kb/s and the bitrate arranged in 1-kb/s steps.3. OTHER AUDIO CODING STANDARDSThe audio data on a compact disc is typically sampled at 44.1 kHz that requires anuncompressed data rate of 1.41 Mb/s for stereo sound with 16 bit pulse code modulation(PCM). Lower bit rates than those given by 16-bit PCM format are mandatory in order tosupport a circuit realization that is compact and has low-power consumption, two keyenabling factors of equipment portability for the user. The digital compact cassette (DCC)developed by Philips is one of the first commercially available forms of perceptual codedmedia. To offer backward compatibility for playback of analog compact cassettes, DCC'scombination of tape speed and symbol spacing yields a raw data rate of only 768 kb/s, half ofthat is used for error correcting redundancy. Another example is Sony's MiniDisc (MD) thatallows us to store a full CD's worth of music on a disc only half the diameter. DCC and MDsystems make use of perceptual coding techniques to achieve the necessary compressionratios of 4:1 and 5:1, respectively. Dolby AC-3 is currently the audio coding standard for theUnited States Grand Alliance HDTV system and has been widely adopted for DVD films.Dolby AC-3 can reproduce various playback configurations from one channel up to 5.1channels: left, right, center, left-surrounding, right-surrounding, and low-frequencyenhancement channels.10

Chi-Min Liu and Wen-Whei Chang, 19993.1 Philips PASCPhilips' DCC incorporates the PASC (Precision Adaptive Subband Coding) algorithmthat is capable of compressing two-channel stereo audio to 384 kb/s with near CD quality[Lokhoff,1992]. PASC can be considered as a simplified version of ISO/MPEG-1 Layer I; itdoes not require a side-chain FFT analysis for the estimation of masking threshold. The PASCencoder creates 32 subband representations of the audio signal, which are then quantizedcoded according to the bit allocation derived from a psychoacoustic model. The firstgeneration PASC encoder performs a very simple psychoacoustic analysis based on theoutputs of the filterbank. By measuring the average power level of 12 samples, the maskinglevels of that particular subband and all the adjacent subbands can be estimated with the helpof an empirically derived 32x32 matrix, which is described in the DCC standard. Thealgorithm assumes the 32 frequencies of this matrix are positioned on the edges of thesubband spectra, the most conservative approach.Every block of 12 samples is converted to a floating-point notation; the mantissadetermines resolution and the exponent controls dynamic range. As in MPEG-1 Layer I, thescale factor is determined and coded as a 6-bit exponent; it is valid for 12 samples within ablock. The algorithm assigns each sample a mantissa with a variable length of 2 to 15 bits,depending on the ratio of the maximum signal to the masking threshold, plus an additional 4bits for allocation information detailing the length of a mantissa.11

Chi-Min Liu and Wen-Whei Chang, 19993.2 Sony ATRACDelayAudioSignalQMFAnalysisFilter 111-22 kHzMDCT-H5.5-11 kHz256 spectra-H128 spectra-MMDCT-MQMFAnalysisFilter 20-5.5 kHzMDCT-LBitAllocation Out/SpectralQuantiziationProcess128 spectra-LBlocksizeDecisionFig. 4. ATRAC audio encoder.The ATRAC (Adaptive TRansform Acoustic Coding) algorithm was developed by Sonyto support 74 minutes of recording and playing time on a 64-mm MiniDisc [Tsutsui, 1992]. Itsupports coding of 44.1 kHz two-channel audio at a rate of 256 kb/s. The key to ATRAC’sefficiency is that psychoacoustic principles are applied to both the bit allocation and thetime-frequency mapping. The encoder (Fig. 4) begins with two stages of quadrature mirrorfilters (QMFs) to divide the audio signal into three subbands which cover the ranges of 0-5.5kHz, 5.5-11.0 kHz, and 11.0-22.0 kHz. These subbands are then transformed from the timedomain to the frequency domain using the modified discrete cosine transform (MDCT). Inaddition, the transform block size adapts to signal characteristics to ensure dynamic tradeoffsbetween time and frequency resolution. The default transform block size is 11.6 ms, but incase of predicted pre-echoes the block size is switched to 1.45 ms in the high-frequency bandand to 2.9 ms in the low- and mid-frequency bands. Following the time-frequency analysis,transform coefficients are grouped nonuniformly into 52 block floating units (BFUs) inaccordance with the ear's critical band partitions. Transform coefficients are quantized using12

Chi-Min Liu and Wen-Whei Chang, 1999two parameters: word length and scale factor. The scale factor defines the full-scale range ofthe quantization and the word length defines the resolution within that scale. Each of the 52BFUs has the same word length and scale factor, reflecting the psychoacoustic similaritywithin each critical band.The bit allocation algorithm determines the word length with the aim of keeping thequantization noise below the masking threshold. One suggested algorithm makes combineduse of fixed and variable bits. The algorithm assigns each BFU variable bits according to thelogarithm of the transform coefficients. Fixed bits are mainly allocated to low-frequency BFUregions; this reflects the ear's decreasing sensitivity toward higher frequencies. The total bitallocation btot (k) is the weighted sum of the fixed bit bfix(k) and the variable bit bvar(k). Thus,for each BFU k, btot (k) T bvar(k) (1-T) bfix(k). The weight T describes the tonality of thesignal, taking a value close to 1 for pure tones; and a value close to 0 for white noise. Toensure a fixed data rate, an offset boff is subtracted from btot(k) to yield the final bit allocationb(k) integer[btot(k)- boff]. As a result, the ATRAC encoder output contains MDCT block sizemode, word length and scale factor for each BFU, and quantized spectral coefficients.13

Chi-Min Liu and Wen-Whei Chang, 19993.3 Dolby ingFig. 5. Dolby AC-3 encoder and decoder.As illustrated in Fig. 5, Dolby AC-3 encoder first employs a MDCT to transform the audiosignals from the time domain to frequency domain. Then, adjacent transform coefficients aregrouped into nonuniform subbands which approximate the critical bands of human auditorysystem. Transform coefficients within one subband are converted to a floating-pointrepresentation, with one or more mantissas per exponent. The exponents are encoded by asuitable strategy according to the required time and frequency resolution and fed into thepsychoacoustic model. Then, the psychoacoustic model calculates the perceptual resolutionaccording to the encoded exponents and the proper perceptual parameters. Finally, both theperceptual resolution and the available bits are used to decide the mantissa quantization.One distinctive feature of Dolby AC-3 is the intimate relationship among exponentcoding, psychoacoustic models, and the bit allocation. This relationship can be described mostconveniently by the hybrid backward/forward bit allocation. The encoded exponents providean estimate of the spectral envelope which, in turn, is used in the psychoacoustic model to14

Chi-Min Liu and Wen-Whei Chang, 1999determine the mantissa quantization. While most audio encoders need to transmit sideinformation about the mantissa quantization, the AC-3 decoder can automatically derive thequantizer information from the decoded exponents and limited perceptual parameters. Thebasic problem with this approach is that the exponents are subject to limited time-frequencyresolution and hence fail to provide a detailed psychoacoustic analysis. The tradeoff betweenthe bit merits of transmitting side information and the constraint psychoacoustic precisiondecides the coding efficiency of Dolby AC-3.4. ARCHITECTURAL OVERVIEWThe principles of the perceptual coding can be considered according to eight aspects: thetime/frequency mapping, quantization and coding, psychoacoustic model, channel correlationand irrelevancy, long-term correlation, pre-echo control, and bit-allocation. This sectionprovides an overview of these standards through examination of the eight aspects.Power of Signals SPL80(SPL in dB)Sound Pressure Level4.1 Psychoacoustic 251020Frequency (kHz)Fig. 6. Masking threshold of a masker centered at 1 kHz.Most perceptual coders rely, at least to some extent, on the psychoacoustic models toreduce the subjective impairments of quantization noise. The encoder analyzes the incoming15

Chi-Min Liu and Wen-Whei Chang, 1999audio signals to identify perceptually important information by incorporating severalpsychoacoustic principles of the human ear [Zwicker, 1990]. One is the critical-band spectralanalysis, which accounts for the ear's poorer discrimination in the higher frequency regionthan in lower ones. Further investigations indicated that a good choice of spectral resolution isaround 20 Hz, which has been implemented in MPEG-2 AAC and MPEG-4. The phenomenonof masking is another effect that occurs whenever a strong signal (masker) makes a spectral ortemporal neighborhood of weaker signals inaudible. To illustrate this, Fig. 6 shows anexample of the masking threshold produced by a masker centered at 1 kHz. The absolutethreshold in the (dashed line) is also included to indicate the minimum audible intensity levelin quiet surroundings. Notice that the slope of the masking curve is less steep on thehigh-frequency side; i.e., higher frequencies are more easily masked. The offset betweenmasker and masking threshold is varied with respect to the tonality of the masker; it has asmaller value for noise-like masker (about 5.5 dB) than tone-like masker (above 25 dB).The encoder performs the psychoacoustic analysis based on either a FFT analysis (inMPEG) or the output of the filterbank (in AC-3 and PASC). The psychoacoustic model usedin AC-3 is specially designed; it does not provide a means to differentiate the masking effectsproduced by either the tonal or the noise masker. MPEG provides two examples ofpsychoacoustic models, the first of which we will now describe. The calculation starts with aprecise spectral analysis on 512 (Layer I) or 1024 (Layer II) input samples to generate themagnitude spectrum. The spectral lines are then examined to discriminate between noise-likeand tone-like maskers by taking the local maximum of magnitude spectrum as an indicator oftonality. Among all the labeled maskers, only those above the absolute threshold are retainedfor further calculation. Using rules known from psychoacoustics, the individual maskingthresholds for the relevant maskers are then calculated dependent on frequency position,loudness level, and the nature of tonality. Finally, we obtain the global masking thresholdfrom the upward and downward slopes of the individual masking thresholds of tonal and16

Chi-Min Liu and Wen-Whei Chang, 1999nontonal maskers and from the absolute threshold in quiet.4.2 Time-Frequency MappingSince psychoacoustic interpretation is mainly described in frequency domain, thetime-frequency mapping is incorporated into the encoder for further signal analysis. Thetime-frequency mapping can be implemented either through PQMF [ISO, 1992], time-domainaliasing cancellation (TDAC) filters [Prince, 1987] or the modified discrete cosine transform[ISO, 1992]. All of them can be referred to as the cosine modulated filterbanks (CMFBs)[Shlien, 1997][Liu, 1998]. The process of CMFBs consists of two steps: thewindow-and-overlapping addition (WOA) followed by the modulated cosine transform(MCT). The WOA performs a windowing multiplication and addition with overlapping audioblocks. Its complexity is O(k) per audio sample, where k is the overlapping factor of audioblocks. In general, the sidelobe attenuation of the filterbank increases with the factor. Forexample, the factor k is 16 for MPEG-1 Layer II and is 2 for AC-3.17

Chi-Min Liu and Wen-Whei Chang, 1999CMFBsOverlapNumber of FrequencySidelobeIn StandardsFactorBandsResolutionkNat 48 kHzMPEG Layers I & II1632750 Hz96 dBMPEG 2nd hybrid21841.66 Hz23dB2102423.40 Hz19dB225693.75 Hz18 dBAtten.level of Layer IIIMPEG-2 AAC,MPEG-4 T/F codingDolby AC-3Table 4 CMFBs used in current audio coding standardsClassesPolyphaseFilterbankMCT Transform PairπN( i )( 2k 1 ))N4i 0N / 2 1πNx i X k cos(( i )( 2k 1 ))2N4i 0N 1X k x i cos(CMFBs in StandardsMPEG Layers I and II (N 64),MPEG Layer III 1st hybrid level(N 64)for k 0, 1 ., N/2-1 and i 0, 1, ., N-1TDACFilterbankMPEG-2—AAC (N 4096),MPEG-4- T/F Coding (N 4096)2 0MPEG Layer III 2nd hybrid levelN / 2 1πNx i X k cos(( 2i 1 )( 2k 1 )) (N 36),AC-3 Long Transform (N 512)2N2k 0πN 1Xk x i cos(( 2i 1 2NiN)( 2k 1 ))for k 0, 1 ., N/2-1 and i 0, 1, ., N-1TDAC-VariantFilterbank X k ٛπN 1x i cos( ٛ( 2i 1 )( 2k 1 )) 2Ni 0N / 2 1xi k 0X k cos(AC-3 Short Transform 1(N 256)πٛ ( 2i 1 )( 2k 1 ))2Nfor k 0, 1 ., N/2-1 and i 0, 1, ., N-1ٛN 1ٛX k x i cos(i 0xi ٛπٛ ( 2i 1 N )( 2k 1 ))2NN / 2 1 k 0X k cos(AC-3 Short Transform 2(N 256)ٛπٛ( 2i 1 N )( 2k 1 ))2Nfor k 0, 1 ., N/2-1 and i 0, 1, ., N-1Table 3 Comparison of filterbank propertiesThe complexity of MCT is O(2N) per audio sample, where N is the number of bands.18

Chi-Min Liu and Wen-Whei Chang, 1999The range of N is from 18 for MPEG-1 Layer III to 2048 for the MPEG-2 advanced audiocoding. Table 4 compares the properties of CMFBs used in audio coding standards. Due to thehigh complexity of MCT, fast algorithms have been developed following the similar conceptsbehind the fast Fourier transform. As listed in Table 4, the MCTs used in current audiostandards can be classified into three different types: time-domain aliasing cancellation(TDAC), variant of the TDAC filterbank, and thepolyphase filterbank.OutputValueRange4.3 QuantizationInput ValueFor perceptual audio coding, quantization involvesStep Sizerepresenting the outputs of the filterbank by a finitenumber of levels with the aim of minimizing thesubjective impairments of quantization noise. TheFig. 7. Quantizer characteristics.characteristics of a quantizer can be specified by means of the step size and the range asshown in Fig. 7. While a uniform quantizer has the same step size throughout the input range,a nonuniform quantizer does not. For a uniform quantizer, the range and the step sizedetermine the number of levels required for the output value. In contrast to a uniformquantizer, the quantization noise of a nonuniform quantizer is varied with respect to the inputvalue. Such a design is more relevant to the human auditory system in the sense that the ear’sability to decipher two sounds with different volumes decreases with the sound pressurelevels.For a uniform quantization with fixed bit rate, the range directly affects the quantizationerror. Hence, an accurate estimation of the range leads to a good control of the quantizationerror. On the other hand, for nonuniform quantization, the quantization noise depends more onthe input values instead of the ranges. In the current audio standards, uniform quantizers areused in MPEG Layer I, Layer II and Dolby AC-3. For MPEG Layers I and II, the scale factor19

Chi-Min Liu and Wen-Whei Chang, 1999helps to determine the range of a quantizer. For Dolby AC-3, the exponent, which accountsfor the range of a uniform quantizer, is adaptive with the time and frequency. Table 5 lists thequantization schemes used in the audio coding standards. For MPEG Layer III, MPEG-2AAC, and MPEG-4 T/F coding, the ranges of nonuniform quantizers are not adaptive with thefrequency.Till now, we only considered the scalar quantization situation where one sample isquantized at a time. On the other hand, vector quantization (VQ) involves representing ablock of input samples at a time. Twin-VQ has been adopted in MPEG-4 T/F coding as analternative to scalar quantization for higher coding efficiency.StandardsQuantizationRange adaptationRange adaptation withTypeswith time at 48 kHzfrequency at 48

The MPEG first-phase (MPEG-1) audio coder operates in single-channel or two-channel stereo mode at sampling rates of 32, 44.1, and 48 kHz. In the second phase of development, particular emphasis is placed on the multichannel audio support and on an extension of the MPEG-1 to lower sampling rates and lower bit rates. MPEG-2 audio consists

Related Documents:

Our Audio Bible MP3/CD's can be used in MP3 compatible CD players, computers and some DVD players. You may also use the MP3 discs to load our Audio Bible into any portable MP3 device. Depending on the language/version, the complete Audio Bible is on 1-5 MP3/CD's. This is a free service and there is no cost whatsoever for our MP3/CD's .

32 Bageshree (registered) Bade Ghulam Ali Khan (concert Vocal mp3 33 Bageshree (registered) Hari Prasad Chaurasia ektal Bansuri / Flute mp3 34 Bageshree (registered) Kishori Amonkar Vocal mp3 35 Bageshree (registered) Rajab Ali Khan Vocal mp3 36 Bageshree (registered) Sya Ram Tiwari Vocal mp3 37 Bageshree 1 Hirabai Barodekar Vocal mp3 38 .

Table of Contents Lesson Topic Audio Guide Page 1 A Guide to Pronunciation Bw_Setswana_Lesson_1.mp3 3 2 Greetings (Formal & Informal)- dialogue Bw_Setswana_Lesson_2.mp3 4 3 Introducing Self / Someone Bw_Setswana_Lesson_3.mp3 4 4 Leave- Taking Expressions Bw_Setswana_Lesson_4.mp3 5 5 Vocabulary 1 (Verbs) Bw_Setswana_Lesson_5.mp3 5,6,7 6 Some Useful Expressions Bw_Setswana_Lesson_6.mp3 7,8

MP3 G actually consist of two files; an MP3 file and a CDG file. Both files must have the same "base" name (the part before the extension) in order to be properly played back. MP3 G Zipped is an extension of the MP3 G format by compressing the MP3 G file pair into a zip (one single file). MP3 Files Most pe

765 S MEDIA TECHNOLOGY Designation Properties Page Audio Audio cables with braided shielding 766 Audio Audio cables, multicore with braided shielding 767 Audio Audio cables with foil shielding, single pair 768 Audio Audio cables, multipaired with foil shielding 769 Audio Audio cables, multipaired, spirally screened pairs and overall braided shielding 770 Audio Digital audio cables AES/EBU .

have a wav (CD quality) file and want to turn it into an mp3 (for emailing, or use in a mp3 player), select 'Export as MP3', while if you have a mp3 file and want to turn it into a wav file, then Audacity will do that for you too. (Remember however that when you turn a mp3 file into wav, the sound quality doesn't improve.)

Intermediate Developing Skills for the TOEFL iBT Advanced Mastering Skills for the TOEFL iBT Reading ( MP3 CD) Listening ( MP3 CD) Speaking ( MP3 CD) Writing ( MP3 CD) Combined Book ( MP3 CD) W i t h M P 3 C D Building Skills for the TOEFL iBT is part of a three-level test-prepa

26 --/--/-- Chuck Clower / Paradise Health Assoc. 020_A-B.mp3 1 27 1/10/1978 Jim McDaniels/ Swallows Club Photographer 021_A-B.mp3 1 28 01/--/79 Sol Stern, Former Pres ASA & Legal Advisor 022_A.mp3 1 29 1/20/1979 June Poole Lange 023_A.mp3 1 30 6/28/1979 Alois Knapp