1、 ETSI TS 126 448 V14.0.0 (2017-04) Universal Mobile Telecommunications System (UMTS); LTE; Codec for Enhanced Voice Services (EVS); Jitter Buffer Management (3GPP TS 26.448 version 14.0.0 Release 14) TECHNICAL SPECIFICATION ETSI ETSI TS 126 448 V14.0.0 (2017-04)13GPP TS 26.448 version 14.0.0 Release
2、 14Reference RTS/TSGS-0426448ve00 Keywords LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important no
3、tice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authori
4、zation of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be
5、 aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the followin
6、g services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF vers
7、ion shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2017. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI regist
8、ered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETSI ETSI TS 126 448 V14.0.0 (2017-04)23GPP TS 26.448 versi
9、on 14.0.0 Release 14Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “I
10、ntellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, i
11、ncluding IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS
12、) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cro
13、ss reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as descri
14、bed in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 126 448 V14.0.0 (2017-04)33GPP TS 26.448 version 14.0.0 Release 14Contents Intellectual Property
15、Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 References 5g33 Definitions, symbols and abbreviations . 5g33.1 Definitions 5g33.2 Symbols 5g33.3 Abbreviations . 6g33.4 Mathematical Expressions 6g34 General . 6g34.1 Introduction 6g34.2 Packet-based communications 7g34.3
16、 EVS Receiver architecture overview 7g35 Jitter Buffer Management . 8g35.1 Overview 8g35.2 Depacketization of RTP packets (informative) 8g35.3 Network Jitter Analysis and Delay Estimation . 9g35.3.1 General 9g35.3.2 Long-term Jitter 10g35.3.3 Short-term jitter 10g35.3.4 Target Playout Delay 10g35.3.
17、5 Playout Delay Estimation . 11g35.4 Adaptation Control Logic . 12g35.4.1 Control Logic 12g35.4.2 Frame-based adaptation 12g35.4.2.1 General 12g35.4.2.2 Insertion of Concealed Frames 12g35.4.2.3 Frame Dropping 13g35.4.2.4 Comfort Noise Insertion in DTX 13g35.4.2.5 Comfort Noise Deletion in DTX . 13g
18、35.4.3 Signal-based adaptation 13g35.4.3.1 General 13g35.4.3.2 Time-shrinking 14g35.4.3.3 Time-stretching . 15g35.4.3.4 Energy Estimation . 16g35.4.3.5 Similarity Measurement 16g35.4.3.6 Quality Control . 17g35.4.3.7 Overlap-add . 17g35.5 Receiver Output Buffer 18g35.6 De-Jitter Buffer 18g36 Decoder
19、 interaction 19g36.1 General . 19g36.2 Decoder Requirements . 19g36.3 Partial Redundancy. 19g36.3.1 Computation of the Partial Redundancy Offset 20g36.3.2 Computation of a frame erasure rate indicator to control the frequency of the Partial Redundancy transmission 21g3Annex A (informative): Change h
20、istory . 22g3History 23 ETSI ETSI TS 126 448 V14.0.0 (2017-04)43GPP TS 26.448 version 14.0.0 Release 14Foreword This Technical Specification has been produced by the 3rdGeneration Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may c
21、hange following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 pr
22、esented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated
23、in the document. ETSI ETSI TS 126 448 V14.0.0 (2017-04)53GPP TS 26.448 version 14.0.0 Release 141 Scope The present document defines the Jitter Buffer Management solution for the Codec for Enhanced Voice Services (EVS). 2 References The following documents contain provisions which, through reference
24、 in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. - For a specific reference, subsequent revisions do not apply. - For a non-specific reference, the latest version
25、applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. 1 3GPP TR 21.905: “Vocabulary for 3GPP Specifications“. 2 3GPP TS 26.445: “Codec for Enhan
26、ced Voice Services (EVS); Detailed Algorithmic Description“. 3 3GPP TS 26.114: “IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction“. 4 3GPP TS 26.071: “Mandatory speech CODEC speech processing functions; AMR speech Codec; General description“. 5 3GPP TS 26.171: “Spee
27、ch codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; General description“. 6 3GPP TS 26.442: “Codec for Enhanced Voice Services (EVS); ANSI C code (fixed-point)“. 7 3GPP TS 26.443: “Codec for Enhanced Voice Services (EVS); ANSI C code (floating-point)“. 8 3GPP
28、TS 26.131: “Terminal acoustic characteristics for telephony; Requirements“. 9 IETF RFC 4867 (2007): “RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs“, J. Sjoberg, M. Westerlund, A. Lakaniemi and Q. Xie. 3 Definitions
29、, symbols and abbreviations 3.1 Definitions For the purposes of the present document, the terms and definitions given in TR 21.905 1 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905 1. 3.2 Symbols For the purp
30、oses of the present document, the following symbols apply: Time signal and time index n in context x, e.g. x can be inp, out, HP, pre, etc. Frame length / size of module x Energy values in context of x ()nsxxLxEETSI ETSI TS 126 448 V14.0.0 (2017-04)63GPP TS 26.448 version 14.0.0 Release 14Correlatio
31、n function in context x 3.3 Abbreviations For the purposes of the present document, the abbreviations given in TR 21.905 1 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905 1. AMR Adaptive Mult
32、i Rate (codec) AMR-WB Adaptive Multi Rate Wideband (codec) CNG Comfort Noise Generator DTX Discontinuous Transmission EVS Enhanced Voice Services FB Fullband FIFO First In, First Out IP Internet ProtocolJBM Jitter Buffer Management MTSI Multimedia Telephony Service for IMS NB Narrowband PCM Pulse Co
33、de Modulation PLC Packet Loss Concealment RTP Real Time Transport Protocol SID Silence Insertion Descriptor SOLA Synchronized overlap-add SWB Super Wideband TSM Time Scale Modification VAD Voice Activity Detection WB Wideband 3.4 Mathematical Expressions For the purposes of the present document, the
34、 following conventions apply to mathematical expressions: indicates the smallest integer greater than or equal to x: , and indicates the largest integer less than or equal to x: , and min(x0,xN1) indicates the minimum of x0, xN1, N being the number of components max(x0,xN1) indicates the maximum of
35、x0, , xN1indicates summation 4 General 4.1 Introduction The present document defines the Jitter Buffer Management solution for the Codec for Enhanced Voice Services (EVS) 2. Jitter Buffers are required in packet-based communications, such as 3GPP MTSI 2, to smooth the inter-arrival jitter of incomin
36、g media packets for uninterrupted playout. The solution is used in conjunction with the EVS decoder and can also be used for AMR 4 and AMR-WB 5. It is optimized for the Multimedia Telephony Service for IMS (MTSI) and fulfils the requirements for delay and jitter-induced concealment operations set in
37、 2. The present document is recommended for implementation in all network entities and UEs supporting the EVS codec. xCx 21.1 = 20.2 =11.1 =x 11.1 = 10.1 =21.1 =ETSI ETSI TS 126 448 V14.0.0 (2017-04)73GPP TS 26.448 version 14.0.0 Release 14In the case of discrepancy between the EVS Jitter Buffer Man
38、agement described in the present document and its ANSI-C code specification contained in 6, the procedure defined by 6 prevails. In the case of discrepancy between the procedure described in the present document and its ANSI-C code specification contained in 7, the procedure defined by 7 prevails. 4
39、.2 Packet-based communications In packet-based communications, packets arrive at the terminal with random jitters in their arrival time. Packets may also arrive out of order. Since the decoder expects to be fed a speech packet every 20 milliseconds to output speech samples in periodic blocks, a de-j
40、itter buffer is required to absorb the jitter in the packet arrival time. The larger the size of the de-jitter buffer, the better its ability to absorb the jitter in the arrival time and consequently fewer late arriving packets are discarded. Voice communications is also a delay critical system and
41、therefore it becomes essential to keep the end to end delay as low as possible so that a two way conversation can be sustained. The defined adaptive Jitter Buffer Management (JBM) solution reflects the above mentioned trade-offs. While attempting to minimize packet losses, the JBM algorithm in the r
42、eceiver also keeps track of the delay in packet delivery as a result of the buffering. The JBM solution suitably adjusts the depth of the de-jitter buffer in order to achieve the trade-off between delay and late losses. 4.3 EVS Receiver architecture overview An EVS receiver for MTSI-based communicat
43、ion is built on top of the EVS Jitter Buffer Management solution. In the EVS Jitter Buffer Management solution the received EVS frames, contained in RTP packets, are depacketized and fed to the Jitter Buffer Management (JBM). The JBM smoothes the inter-arrival jitter of incoming packets for uninterr
44、upted playout of the decoded EVS frames at the Acoustic Frontend of the terminal. Figure 1: Receiver architecture for the EVS Jitter Buffer Management Solution Figure 1 illustrates the architecture and data flow of the receiver side of an EVS terminal. Note that the architecture serves only as an ex
45、ample to outline the integration of the JBM in a terminal. This specification defines the JBM module and its interfaces to the RTP Depacker, the EVS Decoder 2, and the Acoustic Frontend 8. The modules for Modem and Acoustic Frontend are outside the scope of the present document. The actual implement
46、ation of the RTP Depacker is outlined in a basic form; more complex depacketization scenarios depend on the usage of RTP. Real-time implementations of this architecture typically use independent processing threads for reacting on arriving RTP packets from the modem and for requesting PCM data for th
47、e Acoustic Frontend. Arriving packets are typically handled by listening for packets received on the network socket related to the RTP session. Incoming packets are pushed into the RTP Depacker module which extracts the frames contained in an RTP packet. These frame are then pushed into the JBM wher
48、e the statistics are updated and the frames are stored for later decoding and playout. The Acoustic Frontend contains the audio interface which, concurrently to the push operation of EVS frames, pulls PCM buffers from the JBM. The JBM is therefore required to provide PCM buffers, which are normally
49、generated by decoding EVS frames by the EVS decoder or by other means to allow uninterrupted playout. Although the JBM is described for a multi-threaded architecture it does not specify thread-safe data structures due to the dependency on a particular implementation. Note that the JBM does not directly forward frames from the RTP Depacker to the EVS decoder but instead uses frame-based adaptation to smooth the network jitter. In addition signal-based adaptation is executed on the decoded PCM ETSI E