1、 ETSI TS 1Digital cellular telecUniversal Mobile TeGeneral audio coEnhanced aaEncAdvanced(3GPP TS 26.4TECHNICAL SPECIFICATION126 403 V13.0.0 (2016ecommunications system (PhasTelecommunications System (ULTE; codec audio processing functioaacPlus general audio codec; ncoder specification; ed Audio Cod
2、ing (AAC) part .403 version 13.0.0 Release 1316-01) ase 2+); (UMTS); ns; 13) ETSI ETSI TS 126 403 V13.0.0 (2016-01)13GPP TS 26.403 version 13.0.0 Release 13Reference RTS/TSGS-0426403vd00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00
3、 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic
4、versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document
5、 is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is
6、available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any me
7、ans, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. Eur
8、opean Telecommunications Standards Institute 2016. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Part
9、ners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETSI ETSI TS 126 403 V13.0.0 (2016-01)23GPP TS 26.403 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The info
10、rmation pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the
11、 ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or th
12、e updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their
13、3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the prese
14、nt document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables
15、except when used in direct citation. ETSI ETSI TS 126 403 V13.0.0 (2016-01)33GPP TS 26.403 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 5g31 Scope 6g32 Normative references . 6g33.1 Definitions 7g33.2 Symbols 7g33.3 Abbreviatio
16、ns . 7g34 Outline description . 7g35 AAC Encoder . 7g35.1 Overview 7g35.2 Stereo Preprocessing 8g35.3 Filterbank . 9g35.4 Psychoacoustic Model 9g35.4.1 Blockswitching . 9g35.4.2 Threshold Calculation . 11g35.4.2.1 Calculation of the energy spectrum . 11g35.4.2.2 From energy to threshold 12g35.4.2.3
17、Spreading 12g35.4.2.4 Threshold in quiet . 12g35.4.2.5 Pre-echo control 12g35.4.3 Spreaded Energy Calculation 13g35.4.4 Grouping . 13g35.5 Tools . 13g35.5.1 Temporal Noise Shaping (TNS) . 13g35.5.1.1 TNS detection . 13g35.5.1.2 TNS Stereo Synchronization . 14g35.5.1.3 TNS Order . 14g35.5.1.4 TNS Fil
18、tering 14g35.5.1.5 Threshold modification . 14g35.5.2 Mid/Side Stereo 14g35.6 Quantization and coding . 15g35.6.1 Reduction of psychoacoustic requirements . 15g35.6.1.1 Principle of the threshold reduction strategy . 15g35.6.1.1.1 Addition of noise with equal loudness. 15g35.6.1.1.2 Avoidance of spe
19、ctral holes . 15g35.6.1.1.3 Relation between bit demand and perceptual entropy . 16g35.6.1.2 Calculation of Bit Demand 16g35.6.1.3 Calculation of the reduction value 18g35.6.1.3.1 Preparatory steps of the perceptual entropy calculation 19g35.6.1.3.2 Calculation of the desired perceptual entropy . 19
20、g35.6.1.3.3 Selection of the bands for avoidance of holes . 19g35.6.1.3.4 First Estimation of the reduction value 19g35.6.1.3.5 Second Estimation of the reduction value . 20g35.6.1.3.6 Final threshold modification by linearization 20g35.6.1.3.7 Further perceptual entropy reduction . 21g35.6.1.3.8 Po
21、ssible failures . 21g35.6.2 Scalefactor determination . 21g35.6.2.1 Scalefactor Estimation 22g35.6.2.2 Scalefactor Improvement by Quantization 22g35.6.2.3 Scalefactor Difference Reduction . 22g35.6.2.4 Final scalefactor determination . 23g3ETSI ETSI TS 126 403 V13.0.0 (2016-01)43GPP TS 26.403 versio
22、n 13.0.0 Release 135.6.3 Noiseless coding . 23g35.6.4 Out of Bits Prevention 23g3Annex A (informative): Change history . 24g3History 25g3ETSI ETSI TS 126 403 V13.0.0 (2016-01)53GPP TS 26.403 version 13.0.0 Release 13Foreword The present document describes the detailed mapping of the general audio se
23、rvice employing the aacPlus general audio codec within the 3GPP system. The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifyin
24、g change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance,
25、i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the specification; ETSI ETSI TS 126 403 V13.0.0 (2016-01)63GPP TS 26.403 version 13.0.0 Release 131 Scope This Telecommunication Standard (TS) describes the
26、AAC encoder part of the Enhanced aacPlus general audio codec 1. 2 Normative references This TS incorporates by dated and undated reference, provisions from other publications. These normative references are cited in the appropriate places in the text and the publications are listed hereafter. For da
27、ted references, subsequent amendments to or revisions of any of these publications apply to this TS only when incorporated in it by amendment or revision. For undated references, the latest edition of the publication referred to applies. 1 3GPP TS 26.401: “Enhanced aacPlus general audio codec; Gener
28、al Description“. 2 ISO/IEC 14496-3:2001: “Information technology - Coding of audio-visual objects - Part 3: Audio“. 3 ISO/IEC 14496-3:2001/Amd.1:2003: “Bandwidth Extension“. 4 ISO/IEC 14496-3:2001/Amd.1:2003/DCOR1. 5 ISO/IEC 14496-3:2001/ Amd.2:2004: “Parametric Coding for High Quality Audio“. ETSI
29、ETSI TS 126 403 V13.0.0 (2016-01)73GPP TS 26.403 version 13.0.0 Release 133 Definitions, symbols and abbreviations 3.1 Definitions For the purposes of this TS, the following definitions apply: frame: time segment associated with one AAC single channel or channel pair element frequency coefficient: o
30、utput value of the MDCT transform scalefactor band: a group of consecutive frequency coefficients, that will be coded with the same quantizer step size 3.2 Symbols For the purposes of this TS, the following symbols apply: is the current index for the spectral coefficients is the index of the first s
31、pectral coefficient in scalefactorband is the current scalefactor band 3.3 Abbreviations For the purposes of this TS, the following abbreviations apply. AAC Advanced Audio Coding aacPlus Combination of MPEG-4 AAC and MPEG-4 Bandwidth extension (SBR) Enhanced aacPlus Combination of MPEG-4 AAC, MPEG-4
32、 Bandwidth extension (SBR) and MPEG-4 Parametric Stereo KBD Kaiser-Bessel derivedPE perceptual entropy SBR Spectral Band Replication TNS Temporal Noise Shaping 4 Outline description This TS is structured as follows: Section 5.1 gives an encoder overview description. Section 5.2 gives a detailed desc
33、ription of the stereo preprocessing. Section 5.3 gives a detailed description of the filterbank used in the encoder. Section 5.4 gives a detailed description of the psychoacoustic model. Section 5.5 gives a detailed description of the temporal noise shaping and mid/side stereo tools. Section 5.6 giv
34、es a detailed description of the quantization and coding procedure used in the encoder. 5 AAC Encoder 5.1 Overview The AAC encoder acts as the core encoding algorithm of the aacPlus system encoding at half the sampling rate of aacPlus. Since aacPlus implements the High Efficiency AAC Profile at Leve
35、l 2 as defined in 3, the AAC LC object k()kOffset n nnETSI ETSI TS 126 403 V13.0.0 (2016-01)83GPP TS 26.403 version 13.0.0 Release 13type is used. The AAC LC object type does not implement the Long Term Predictor (LTP) tool. The Level 2 implies a restriction to a maximum of two channels. Furthermore
36、 in case of SBR being used, the maximum AAC sampling rate is restricted to 24 kHz whereas if SBR is not used the maximum AAC sampling rate is restricted to 48 kHz. The basic layout is depicted below. Figure 1: AAC Encoder Block Diagram 5.2 Stereo Preprocessing With stereo preprocessing, the stereo w
37、idth of difficult to encode signals at low bitrates is reduced. Stereo preprocessing is active for bitrates less than 60kbit/s. The side channel is attenuated with influence of the following parameters: - The total perceptual entropy before the increase of the thresholds. This PE is smoothed over pa
38、st frames and normalized. For a definition of the perceptual entropy see 5.6.1.1.3. . StereoPreprocessingFilterbankTNSM/SReduction ofpsychoacousticrequirementsscalefactors /quantizationNoiselessCodingOut of bitspreventionPsycho-acousticModelBitstreammultiplexBitstreamInput signalQuantization k=lpcSt
39、artLine; k-) wfack = (wfack + wfack+1) / 2; and up: for (k=lpcStartLine+1; k= ( ) 20dB/ barkls n bark(n)= ( ) 15dB/ barkhs nbark(n)=bark(n) n()enn ()esn()X k () () ()wX kXkwfack=1()()wfac kenn=()ennETSI ETSI TS 126 403 V13.0.0 (2016-01)143GPP TS 26.403 version 13.0.0 Release 13The lower and upper li
40、mits lpcStartLine and lpcStopLine depend on the bitrate and the blocktype. Next steps are an autocorrelation calculation and a LPC calculation using the Levinson-Durbin algorithm. As result so called parcor or reflection coefficients and the prediction gain are available. TNS will be used only if th
41、e prediction gain is greater than a given threshold, which is bitrate dependent and varies between 1.2 and 1.41. 5.5.1.2 TNS Stereo Synchronization If prediction gains for the left and right channel differ only less than 3%, the same TNS filter coefficients are chosen for both channels by copying th
42、e TNS data of the left channel to the right channel. 5.5.1.3 TNS Order The TNS parcor coefficients will be quantized with a resolution of 4 bits for long blocks and 3 bits for short blocks. The order of the coefficients is now determined by going down from the maximum order until the first coefficie
43、nt that exceeds an absolute value of 0.1 has been reached. 5.5.1.4 TNS Filtering The spectral coefficients will now be replaced by filtering with the parcor coefficients. The first scalefactor band affected corresponds to a frequency of 1275Hz for long blocks resp. 2750Hz for short blocks. The filte
44、ring is done with the help of a so called lattice filter, no conversion from parcor coefficients to linear prediction coefficients is required. 5.5.1.5 Threshold modification In the frequency range from 380Hz to the start frequency of the TNS filter the coding demands will be increased by multiplyin
45、g a factor of 0.25 to the thresholds calculated by the psychoacoustic model. 5.5.2 Mid/Side Stereo Normal stereo operation, and thus Mid/Side Stereo, is only required when operating the encoder at bitrates at or above 44 kbit/s. Below 44 kbit/s the Parametric Stereo coding tool 5 is used instead whe
46、re the AAC core is operated in mono. Within Mid/Side Stereo, for each scalefactor band the left and right channel coefficients are either coded as L and R or as mid and side channel and . For stereo in the psychoacoustic model in addition to the left and right energies also the mid and side energies
47、 are calculated. The threshold for coding mid and side channel is simply the minimum of left and right thresholds . M/S coding is actually used if is fulfilled. In such a case left channel values for spectral coefficients, energies and thresholds will be replaced by the mid channel values, resp. rig
48、ht channel values will be replaced by the side channel values. The spreaded energy for mid and side channel will be the minium of the spreaded energy of left and right channel. rq()X krq()thr n2LRM+=2LRS=,()LRenn,()MSenn,()LRthr n2min( (), () () ()() () () ()LR L RMS LRthr n thr n thr n thr nen n en
49、 n en n en n()esnETSI ETSI TS 126 403 V13.0.0 (2016-01)153GPP TS 26.403 version 13.0.0 Release 135.6 Quantization and coding 5.6.1 Reduction of psychoacoustic requirements Usually the requirements of the psychoacoustic model are too strong for the desired bitrate. Thus a threshold reduction strategy is necessary, i.e. the strategy reduces the requirements by increasing the thresholds given by the psychoacoustic model. An overcoding, i.e. decreasing the thresholds for a finer quantization, doesnt take place in this encoder. In this