1、 Recommendation ITU-R BS.1196-6 (12/2017) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-6 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economical use of the radio-frequency spectrum by all
2、radiocommunication services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by World and Regional Radiocommunication Conferences
3、 and Radiocommunication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Annex 1 of Resolution ITU-R 1. Forms to be used for the submission of patent statements and lic
4、ensing declarations by patent holders are available from http:/www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series of ITU-R Recommendations (Also available onli
5、ne at http:/www.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermination, amateur and related satellite services
6、P Radiowave propagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satellite news gathering TF Time signals and frequen
7、cy standards emissions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1. Electronic Publication Geneva, 2017 ITU 2017 All rights reserved. No part of this publication may be reproduced, by any means whatsoeve
8、r, without written permission of ITU. Rec. ITU-R BS.1196-6 1 RECOMMENDATION ITU-R BS.1196-6* Audio coding for digital broadcasting (Question ITU-R 19-1/6) (1995-2001-2010-2012-02/2015-10/2015-2017) Scope This Recommendation specifies audio source coding systems applicable for digital sound and telev
9、ision broadcasting. It further specifies a system applicable for the backward compatible multichannel enhancement of digital sound and television broadcasting systems. Keywords Audio, Audio Coding, Broadcast, Digital, Broadcasting, Sound, Television, Codec The ITU Radiocommunication Assembly, consid
10、ering a) that user requirements for audio coding systems for digital broadcasting are specified in Recommendation ITU-R BS.1548; b) that multi-channel sound system with and without accompanying picture is the subject of Recommendation ITU-R BS.775 and that a high-quality, multi-channel sound system
11、using efficient bit rate reduction is essential in a digital broadcasting system; c) that the advanced sound system specified in Recommendation ITU-R BS.2051 consists of three-dimensional channel configurations and uses either static or dynamic metadata to control object-based, scene-based, and chan
12、nel-based signals; d) that subjective assessment of audio systems with small impairments, including multi-channel sound systems is the subject of Recommendation ITU-R BS.1116; e) that subjective assessment of audio systems of intermediate audio quality is subject of Recommendation ITU-R BS.1534 (MUS
13、HRA); f) that low bit-rate coding for high quality audio has been tested by the ITU Radiocommunication Sector; g) that commonality in audio source coding methods among different services may provide increased system flexibility and lower receiver costs; h) that many broadcast services already use or
14、 have specified the use of audio codecs from the families of MPEG-1, MPEG-2, MPEG-4, AC-3 and E-AC-3; i) that Recommendation ITU-R BS.1548 lists codecs that have been shown to meet the broadcasters requirements for contribution, distribution and emission; j) that those broadcasters which have not ye
15、t started services should be able to choose the system which is best suited to their application; k) that broadcasters may need to consider compatibility with legacy broadcasting systems and equipment when selecting a system; * This Recommendation should be brought to the attention of the Internatio
16、nal Standardization Organization (ISO) and the International Electrotechnical Commission (IEC). 2 Rec. ITU-R BS.1196-6 l) that when introducing a multi-channel sound system existing mono and stereo receivers should be considered; m) that a backward compatible multi-channel extension to an existing a
17、udio coding system can provide better bit rate efficiency than simulcast; n) that an audio coding system should preferably be able to encode both speech and music with equally high fidelity, recommends 1 that for new applications of digital sound or television broadcasting emission, where compatibil
18、ity with legacy transmissions and equipment is not required, one of the following low bit-rate audio coding systems should be employed: Extended HE AAC as specified in ISO/IEC 23003-3:2012; E-AC-3 as specified in ETSI TS 102 366 (2014-08); AC-4 as specified in ETSI TS 103 190-1 v1.1.1 (2015-06) and
19、ETSI TS 103 190-2 v1.2.1 (2015-09); MPEG-H 3D Audio LC Profile as specified in ISO/IEC 23008-3:2015/Amd 3:2017. NOTE 1 Extended HE AAC is a more flexible superset of MPEG-4 HE AAC v2, HE AAC and AAC LC, and includes MPEG-D Unified Speech and Audio Coding (USAC). NOTE 2 E-AC-3 is a more flexible supe
20、rset of AC-3. NOTE 3 The AC-4 and MPEG-H 3D Audio LC profile specifications include capabilities that are able to support the advanced sound system specified in Recommendation ITU-R BS.2051 and users should refer to Recommendation ITU-R BS.1548 for codec compliance; 2 that for applications of digita
21、l sound or television broadcasting emission, where compatibility with legacy transmissions and equipment is required, one of the following low bit-rate coding systems should be employed: MPEG-1 Layer II as specified in ISO/IEC 11172-3:1993; MPEG-2 Layer II half sample rate as specified in ISO/IEC 13
22、818-3:1998; MPEG-2 AAC-LC or MPEG-2 AAC-LC with SBR as specified in ISO/IEC 13818-7:2006; MPEG-4 AAC-LC as specified in ISO/IEC 14496-3:2009; MPEG-4 HE AAC v2 as specified in ISO/IEC 14496-3:2009; AC-3 as specified in ETSI TS 102 366 (2014-08); NOTE 4 ISO/IEC 11172-3 may sometimes be referred to as
23、13818-3 as this specification includes 11172-3 by reference. NOTE 5 It is encouraged to support Extended HE AAC as specified in ISO/IEC 23003-3:2012. It includes all of the above mentioned AAC versions, thus guaranteeing compatibility with new future as well as legacy broadcast systems worldwide wit
24、h the same single decoder implementation; 3 that for backward compatible multi-channel extension of digital television and sound broadcasting systems, the multichannel audio extensions described in ISO/IEC 23003-1:2007 should be used; NOTE 6 Since the MPEG Surround technology described in ISO/IEC 23
25、003-1:2007 is independent of the compression technology (core coder) used for transmission of the backward compatible signal, the described multi-channel enhancement tools can be used in combination with any of the coding systems recommended under recommends 1 and 2; Rec. ITU-R BS.1196-6 3 4 that fo
26、r distribution and contribution links, ISO/IEC 11172-3 Layer II coding may be used at a bit rate of at least 180 kbit/s per audio signal (i.e. per mono signal, or per component of an independently coded stereo signal) excluding ancillary data; 5 that for commentary links, ISO/IEC 11172-3 Layer III c
27、oding may be used at a bit rate of at least 60 kbit/s excluding ancillary data for mono signals, and at least 120 kbit/s excluding ancillary data for stereo signals, using joint stereo coding; 6 that for high quality applications the sampling frequency should be 48 kHz; 7 that the input signal to th
28、e low bit rate audio encoder should be emphasis-free and no emphasis should be applied by the encoder, further recommends 1 that Recommendation ITU-R BS.1548 should be referred to for information about coding system configurations that have been demonstrated to meet quality and other user requiremen
29、ts for contribution, distribution, and emission; 2 that further studies of the requirements for the advanced sound system specified in Recommendation ITU-R BS.2051 are needed and that this Recommendation should be updated when these studies are completed. NOTE Information about the codecs included i
30、n this Recommendation may be found in Annexes 1 to 8. Annex 1 (informative) MPEG-1 and MPEG-2, layer II and III audio 1 Encoding The encoder processes the digital audio signal and produces the compressed bit stream. The encoder algorithm is not standardized and may use various means for encoding, su
31、ch as estimation of the auditory masking threshold, quantization, and scaling (following Note 1). However, the encoder output must be such that a decoder conforming to this Recommendation will produce an audio signal suitable for the intended application. NOTE 1 An encoder complying with the descrip
32、tion given in Annexes C and D to ISO/IEC 11172-3, 1993 will give a satisfactory minimum standard of performance. The following description is of a typical encoder, as shown in Fig. 1. Input audio samples are fed into the encoder. The time-to-frequency mapping creates a filtered and sub-sampled repre
33、sentation of the input audio stream. The mapped samples may be either sub-band samples (as in Layer I or II, see below) or transformed sub-band samples (as in Layer III). A psycho-acoustic model, using a fast Fourier transform, operating in parallel with the time-to-frequency mapping of the audio si
34、gnal creates a set of data to control the quantizing and coding. These data are different depending on the actual coder implementation. One possibility is to use an estimation of the masking threshold to control the quantizer. The scaling, quantizing and coding block creates a set of coded symbols f
35、rom the mapped input samples. Again, the transfer function of this block can depend on the implementation of the encoding system. The block “frame packing” assembles the actual bit stream 4 Rec. ITU-R BS.1196-6 for the chosen layer from the output data of the other blocks (e.g. bit allocation data,
36、scale factors, coded sub-band samples) and adds other information in the ancillary data field (e.g. error protection), if necessary. FIGURE 1 Block diagram of a typical encoder BS .11 96 -01P C Ma udi o s i gna lT i m e - t o- f r e que nc ym a ppi ngS c a l i ngqua nt i z i nga nd c odi ngF r a m e
37、 pa c ki ngP s yc hoa c ous t i c m ode lI S O / I E C 1 1 172 - 3c ode d b i t s t r e a mI S O / I E C 1 1 172 - 3 e nc ode d A nc i l l a r y d a t a2 Layers Depending on the application, different layers of the coding system with increasing complexity and performance can be used. Layer I: This l
38、ayer contains the basic mapping of the digital audio input into 32 sub-bands, fixed segmentation to format the data into blocks, a psycho-acoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting. One Layer I frame represents 384 samples per chan
39、nel. Layer II: This layer provides additional coding of bit allocation, scale factors, and samples. One Layer II frame represents 3 384 = 1 152 samples per channel. Layer III: This layer introduces increased frequency resolution based on a hybrid filter bank (a 32 sub-band filter bank with variable
40、length modified discrete cosine transform). It adds a non-uniform quantizer, adaptive segmentation, and entropy coding of the quantized values. One Layer III frame represents 1 152 samples per channel. There are four different modes possible for any of the layers: single channel; dual channel (two i
41、ndependent audio signals coded within one bit stream, e.g. bilingual application); stereo (left and right signals of a stereo pair coded within one bit stream); Rec. ITU-R BS.1196-6 5 joint stereo (left and right signals of a stereo pair coded within one bit stream with the stereo irrelevancy and re
42、dundancy exploited). The joint stereo mode can be used to increase the audio quality at low bit rates and/or to reduce the bit rate for stereophonic signals. 3 Coded bit stream format An overview of the ISO/IEC 11172-3 bit stream is given in Fig. 2 for Layer II and Fig. 3 for Layer III. A coded bit
43、stream consists of consecutive frames. Depending on the layer, a frame includes the following fields: FIGURE 2 ISO/IEC 11172-3 Layer II bit stream format BS .11 96 -02F ra m e 1n F r a m e n F r a m e + 1nA nc i l l a r y d a t aM a i n a udi o i nf or m a t i onL a ye r I I :pa r t of t he bi t s t
44、 r e a m c ont a i ni ng s ync hr oni z a t i on a nd s t a t usi nf or m a t i onpa r t of t he bi t s t r e a m c ont a i ni ng bi t a l l oc a t i on a nd s c a l e f a c t ori nf or m a t i onpa r t of t he bi t s t r e a m c ont a i ni ng e nc ode d s ub- ba nd s a m pl e spa r t of t he bi t s
45、 t r e a m c ont a i ni ng us e r de f i na bl e da t aH e a de rS i de i nf or m a t i onH e a de r :S i de i nf or m a t i on:M a i n a udi o i nf or m a t i on:A nc i l l a r y d a t a :6 Rec. ITU-R BS.1196-6 FIGURE 3 ISO/IEC 11172-3 Layer III bit stream format BS .1196-03L e ngt h_1 + L e ngt h_
46、SI + L e ngt h_2SI SI SIH e a de r L e ngt h_1M a i n a udi o i nf or m a t i ona nc i l l a r y d a t aL a ye r I I I :S i de i nf or m a t i on ( S I ) :H e a de r :P oi nt e r :L e ngt h_1 :M a i n a udi o i nf or m a t i on:A nc i l l a r y d a t a :L e ngt h_2 :pa r t of t he bi t s t r e a m c
47、 ont a i ni ng he a de r , po i nt e r , l e ngt h_1 a ndl e ngt h_2 , s c a l e f a c t or i nf or m a t i on, e t c .;pa r t of t he bi t s t r e a m c ont a i ni ng s ync hr oni z a t i on a nd s t a t usi nf or m a t i on;poi nt i ng t o b e gi nni ng of m a i n a udi o i nf or m a t i on;l e ng
48、t h o f f i r s t pa r t of m a i n a udi o i nf or m a t i on;l e ngt h o f s e c ond pa r t of m a i n a udi o i nf or m a t i on;pa r t of t he bi t s t r e a m c ont a i ni ng e nc ode d a udi o;pa r t of t he bi t s t r e a m c ont a i ni ng us e r de f i na bl e da t a .P oi nt e r L e ngt h_2
49、4 Decoding The decoder accepts coded audio bit streams in the syntax defined in ISO/IEC 11172-3, decodes the data elements, and uses the information to produce digital audio output. The coded audio bit stream is fed into the decoder. The bit stream unpacking and decoding process optionally performs error detection if error-check is applied in the encoder. The bit stream is unpacked to recover the various pieces of information, such as audio frame header, bit allocation, scale factors, mapped samples, and, optionally, ancillary data. The r