1、 ETSI TS 126 403 V15.0.0 (2018-07) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; General audio codec audio processing functions; Enhanced aacPlus general audio codec; Encoder specification; Advanced Audio Coding (AAC) part (3GPP
2、TS 26.403 version 15.0.0 Release 15) TECHNICAL SPECIFICATION ETSI ETSI TS 126 403 V15.0.0 (2018-07)13GPP TS 26.403 version 15.0.0 Release 15Reference RTS/TSGS-0426403vf00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 6
3、5 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or
4、in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of
5、 the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at htt
6、ps:/portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, e
7、lectronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. ETSI 2018
8、. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are trademarks of ETSI registered for the benefit of its Members. 3GPPTM and LTETMare trademarks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. oneM2M logo is protected for the benefit of
9、its Members. GSMand the GSM logo are trademarks registered and owned by the GSM Association. ETSI ETSI TS 126 403 V15.0.0 (2018-07)23GPP TS 26.403 version 15.0.0 Release 15Intellectual Property Rights Essential patents IPRs essential or potentially essential to normative deliverables may have been d
10、eclared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, wh
11、ich is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced
12、in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Trademarks The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners. ETSI claims no ownership of these except f
13、or any which are indicated as being the property of ETSI, and conveys no right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks. F
14、oreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the
15、corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and
16、“cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 126 403 V15.0.0 (2018-07)33GPP TS 26.403 version 15.0.0 R
17、elease 15Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 5g31 Scope 6g32 Normative references . 6g33 Definitions, symbols and abbreviations . 6g33.1 Definitions 6g33.2 Symbols 6g33.3 Abbreviations . 7g34 Outline description . 7g35 AAC Encoder . 7g35.1 Ove
18、rview 7g35.2 Stereo Preprocessing 8g35.3 Filterbank . 9g35.4 Psychoacoustic Model 9g35.4.1 Blockswitching . 9g35.4.2 Threshold Calculation . 11g35.4.2.1 Calculation of the energy spectrum . 11g35.4.2.2 From energy to threshold 12g35.4.2.3 Spreading 12g35.4.2.4 Threshold in quiet . 12g35.4.2.5 Pre-ec
19、ho control 12g35.4.3 Spreaded Energy Calculation 12g35.4.4 Grouping . 13g35.5 Tools . 13g35.5.1 Temporal Noise Shaping (TNS) . 13g35.5.1.1 TNS detection . 13g35.5.1.2 TNS Stereo Synchronization . 14g35.5.1.3 TNS Order . 14g35.5.1.4 TNS Filtering 14g35.5.1.5 Threshold modification . 14g35.5.2 Mid/Sid
20、e Stereo 14g35.6 Quantization and coding . 14g35.6.1 Reduction of psychoacoustic requirements . 14g35.6.1.1 Principle of the threshold reduction strategy . 15g35.6.1.1.1 Addition of noise with equal loudness. 15g35.6.1.1.2 Avoidance of spectral holes . 15g35.6.1.1.3 Relation between bit demand and p
21、erceptual entropy . 16g35.6.1.2 Calculation of Bit Demand 16g35.6.1.3 Calculation of the reduction value 18g35.6.1.3.1 Preparatory steps of the perceptual entropy calculation 19g35.6.1.3.2 Calculation of the desired perceptual entropy . 19g35.6.1.3.3 Selection of the bands for avoidance of holes . 1
22、9g35.6.1.3.4 First Estimation of the reduction value 19g35.6.1.3.5 Second Estimation of the reduction value . 20g35.6.1.3.6 Final threshold modification by linearization 20g35.6.1.3.7 Further perceptual entropy reduction . 21g35.6.1.3.8 Possible failures . 21g35.6.2 Scalefactor determination . 21g35
23、.6.2.1 Scalefactor Estimation 22g35.6.2.2 Scalefactor Improvement by Quantization 22g35.6.2.3 Scalefactor Difference Reduction . 22g35.6.2.4 Final scalefactor determination . 22g3ETSI ETSI TS 126 403 V15.0.0 (2018-07)43GPP TS 26.403 version 15.0.0 Release 155.6.3 Noiseless coding . 23g35.6.4 Out of
24、Bits Prevention 23g3Annex A (informative): Change history . 24g3History 25g3ETSI ETSI TS 126 403 V15.0.0 (2018-07)53GPP TS 26.403 version 15.0.0 Release 15Foreword The present document describes the detailed mapping of the general audio service employing the aacPlus general audio codec within the 3G
25、PP system. The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying change of release date and an increase in version number as
26、 follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the
27、 third digit is incremented when editorial only changes have been incorporated in the specification; ETSI ETSI TS 126 403 V15.0.0 (2018-07)63GPP TS 26.403 version 15.0.0 Release 151 Scope This Telecommunication Standard (TS) describes the AAC encoder part of the Enhanced aacPlus general audio codec
28、1. 2 Normative references This TS incorporates by dated and undated reference, provisions from other publications. These normative references are cited in the appropriate places in the text and the publications are listed hereafter. For dated references, subsequent amendments to or revisions of any
29、of these publications apply to this TS only when incorporated in it by amendment or revision. For undated references, the latest edition of the publication referred to applies. 1 3GPP TS 26.401: “Enhanced aacPlus general audio codec; General Description“. 2 ISO/IEC 14496-3:2001: “Information technol
30、ogy - Coding of audio-visual objects - Part 3: Audio“. 3 ISO/IEC 14496-3:2001/Amd.1:2003: “Bandwidth Extension“. 4 ISO/IEC 14496-3:2001/Amd.1:2003/DCOR1. 5 ISO/IEC 14496-3:2001/ Amd.2:2004: “Parametric Coding for High Quality Audio“. 3 Definitions, symbols and abbreviations 3.1 Definitions For the p
31、urposes of this TS, the following definitions apply: frame: time segment associated with one AAC single channel or channel pair element frequency coefficient: output value of the MDCT transform scalefactor band: a group of consecutive frequency coefficients, that will be coded with the same quantize
32、r step size 3.2 Symbols For the purposes of this TS, the following symbols apply: is the current index for the spectral coefficients is the index of the first spectral coefficient in scalefactorband is the current scalefactor band k()kOffset n nnETSI ETSI TS 126 403 V15.0.0 (2018-07)73GPP TS 26.403
33、version 15.0.0 Release 153.3 Abbreviations For the purposes of this TS, the following abbreviations apply. AAC Advanced Audio Coding aacPlus Combination of MPEG-4 AAC and MPEG-4 Bandwidth extension (SBR) Enhanced aacPlus Combination of MPEG-4 AAC, MPEG-4 Bandwidth extension (SBR) and MPEG-4 Parametr
34、ic Stereo KBD Kaiser-Bessel derivedPE perceptual entropy SBR Spectral Band Replication TNS Temporal Noise Shaping 4 Outline description This TS is structured as follows: Section 5.1 gives an encoder overview description. Section 5.2 gives a detailed description of the stereo preprocessing. Section 5
35、.3 gives a detailed description of the filterbank used in the encoder. Section 5.4 gives a detailed description of the psychoacoustic model. Section 5.5 gives a detailed description of the temporal noise shaping and mid/side stereo tools. Section 5.6 gives a detailed description of the quantization
36、and coding procedure used in the encoder. 5 AAC Encoder 5.1 Overview The AAC encoder acts as the core encoding algorithm of the aacPlus system encoding at half the sampling rate of aacPlus. Since aacPlus implements the High Efficiency AAC Profile at Level 2 as defined in 3, the AAC LC object type is
37、 used. The AAC LC object type does not implement the Long Term Predictor (LTP) tool. The Level 2 implies a restriction to a maximum of two channels. Furthermore in case of SBR being used, the maximum AAC sampling rate is restricted to 24 kHz whereas if SBR is not used the maximum AAC sampling rate i
38、s restricted to 48 kHz. The basic layout is depicted below. ETSI ETSI TS 126 403 V15.0.0 (2018-07)83GPP TS 26.403 version 15.0.0 Release 15Figure 1: AAC Encoder Block Diagram 5.2 Stereo Preprocessing With stereo preprocessing, the stereo width of difficult to encode signals at low bitrates is reduce
39、d. Stereo preprocessing is active for bitrates less than 60kbit/s. The side channel is attenuated with influence of the following parameters: - The total perceptual entropy before the increase of the thresholds. This PE is smoothed over past frames and normalized. For a definition of the perceptual
40、entropy see 5.6.1.1.3. . - The energy ratio between side and mid channel smoothed over past frames. If the side channel is very strong, less attenuation of the side channel should happen. - The energy ratio between the left and right channel. Less attenuation of the side channel occurs for signals t
41、hat appear to be nearly on the left or the right. StereoPreprocessingFilterbankTNSM/SReduction ofpsychoacousticrequirementsscalefactors /quantizationNoiselessCodingOut of bitspreventionPsycho-acousticModelBitstreammultiplexBitstreamInput signalQuantization k=lpcStartLine; k-) wfack = (wfack + wfack+
42、1) / 2; and up: for (k=lpcStartLine+1; kg173=g174 g175( ) 20dB/ barkls n bark(n)= ( ) 15dB/ barkhs n bark(n)=bark(n) n()enn ()esn()X k () () ()wX kXkwfack=1()()wfac kenn=()ennrqETSI ETSI TS 126 403 V15.0.0 (2018-07)143GPP TS 26.403 version 15.0.0 Release 155.5.1.2 TNS Stereo Synchronization If predi
43、ction gains for the left and right channel differ only less than 3%, the same TNS filter coefficients are chosen for both channels by copying the TNS data of the left channel to the right channel. 5.5.1.3 TNS Order The TNS parcor coefficients will be quantized with a resolution of 4 bits for long bl
44、ocks and 3 bits for short blocks. The order of the coefficients is now determined by going down from the maximum order until the first coefficient that exceeds an absolute value of 0.1 has been reached. 5.5.1.4 TNS Filtering The spectral coefficients will now be replaced by filtering with the parcor
45、 coefficients. The first scalefactor band affected corresponds to a frequency of 1275Hz for long blocks resp. 2750Hz for short blocks. The filtering is done with the help of a so called lattice filter, no conversion from parcor coefficients to linear prediction coefficients is required. 5.5.1.5 Thre
46、shold modification In the frequency range from 380Hz to the start frequency of the TNS filter the coding demands will be increased by multiplying a factor of 0.25 to the thresholds calculated by the psychoacoustic model. 5.5.2 Mid/Side Stereo Normal stereo operation, and thus Mid/Side Stereo, is onl
47、y required when operating the encoder at bitrates at or above 44 kbit/s. Below 44 kbit/s the Parametric Stereo coding tool 5 is used instead where the AAC core is operated in mono. Within Mid/Side Stereo, for each scalefactor band the left and right channel coefficients are either coded as L and R o
48、r as mid and side channel and . For stereo in the psychoacoustic model in addition to the left and right energies also the mid and side energies are calculated. The threshold for coding mid and side channel is simply the minimum of left and right thresholds . M/S coding is actually used if is fulfil
49、led. In such a case left channel values for spectral coefficients, energies and thresholds will be replaced by the mid channel values, resp. right channel values will be replaced by the side channel values. The spreaded energy for mid and side channel will be the minium of the spreaded energy of left and right channel. 5.6 Quantization and coding 5.6.1 Reduction of psychoacoustic requirements Usually the requirements of the psychoacoustic model are too strong for the desired bitrate. Thus a threshold reduction strategy is necessary, i.e. the