|
Shahrad Rezaei Tehrani |
|
|
MPEG Audio Layer-3
History
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1:4 | by Layer 1 (corresponds with 384 kbps for a stereo signal), |
|---|---|
| 1:6...1:8 | by Layer 2 (corresponds with 256..192 kbps for a stereo signal), |
| 1:10...1:12 | by Layer 3 (corresponds with 128..112 kbps for a stereo signal), |
Some typical performance data of MPEG
Layer-3 are:
| sound quality | bandwidth | mode | bitrate | reduction ratio |
|---|---|---|---|---|
| telephone sound | 2.5 kHz | mono | 8 kbps * | 96:1 |
| better than short-wave | 4.5 kHz | mono | 16 kbps | 48:1 |
| better than AM radio | 7.5 kHz | mono | 32 kbps | 24:1 |
| similar to FM radio | 11 kHz | stereo | 56...64 kbps | 26...24:1 |
| near-CD | 15 kHz | stereo | 96 kbps | 16:1 |
| CD | >15 kHz | stereo | 112..128kbps | 14..12:1 |
| *) Fraunhofer uses a non-ISO extension of MPEG Layer-3 for enhanced performance ("MPEG 2.5") | ||||
In all international listening tests,
MPEG Layer-3 impressively proved its superior performance, maintaining the
original sound quality at a data reduction of 1:12 (around 64 kbit/s per audio
channel). If applications may tolerate a limited bandwidth of around 10 kHz, a
reasonable sound quality for stereo signals can be achieved even at a reduction
of 1:24.
For the use of low bit-rate audio coding schemes in broadcast applications at
bitrates of 60 kbit/s per audio channel, the ITU-R recommends MPEG Layer-3. (ITU-R
doc. BS.1115)
Filter bank
The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists of a
polyphase filter bank and a Modified Discrete Cosine Transform (MDCT). This
hybrid form was chosen for reasons of compatibility to its predecessors, Layer-1
and Layer-2.
Perceptual Model
The perceptual model is mainly determining the quality of a given encoder
implementation. It uses either a separate filter bank or combines the
calculation of energy values (for the masking calculations) and the main filter
bank. The output of the perceptual model consists of values for the masking
threshold or the allowed noise for each coder partition. If the quantization
noise can be kept below the masking threshold, then the compression results
should be indistinguishable from the original signal.
Joint Stereo
Joint stereo coding takes advantage of the fact that both channels of a stereo
channel pair contain far the same information. These stereophonic irrelevancies
and redundancies are exploited to reduce the total bitrate. Joint stereo is used
in cases where only low bitrates are available but stereo signals are desired.
Quantization and Coding
A system of two nested iteration loops is the common solution for quantization
and coding in a Layer-3 encoder.
Quantization is done via a power-law quantizer. In this way, larger values are
automatically coded with less accuracy and some noise shaping is already built
into the quantization process.
The quantized values are coded by Huffman coding. As a specific method for
entropy coding, hufman coding is lossless. Thus is called noiseless coding
because no noise is added to the audio signal.
The process to find the optimum gain and scalefactors for a given block,
bit-rate and output from the perceptual model is usually done by two nested
iteration loops in an analysis-by-synthesis way:
| Inner iteration loop (rate loop) The Huffman code tables assign shorter code words to (more frequent) smaller quantized values. If the number of bits resulting from the coding operation exceeds the number of bits available to code a given block of data, this can be corrected by adjusting the global gain to result in a larger quantization step size, leading to smaller quantized values. This operation is repeated with different quantization step sizes until the resulting bit demand for Huffman coding is small enough. The loop is called rate loop because it modifies the overall coder rate until it is small enough.
| |
| Outer iteration loop (noise
control/distortion loop) To shape the quantization noise according to the masking threshold, scalefactors are applied to each scalefactor band. The systems starts with a default factor of 1.0 for each band. If the quantization noise in a given band is found to exceed the masking threshold (allowed noise) as supplied by the perceptual model, the scalefactor for this band is adjusted to reduce the quantization noise. Since achieving a smaller quantization noise requires a larger number of quantization steps and thus a higher bitrate, the rate adjustment loop has to be repeated every time new scalefactors are used. In other words, the rate loop is nested within the noise control loop. The outer (noise control) loop is executed until the actual noise (computed from the difference of the original spectral values minus the quantized spectral values) is below the masking threshold for every scalefactor band (i.e. critical band). |
|
This page was last updated on 03-Mar-2002.
YAHOO ID: Shrezai@yahoo.com |