ETSI TS 126 092-2016 Digital cellular telecommunications system (Phase 2+) Universal Mobile Telecommunications System (UMTS) LTE Mandatory speech codec speech processing functions .pdf

资源描述

1、 ETSI TS 1Digital cellular telecoUniversal Mobile TelMandatory speech coAdaptive MultCom(3GPP TS 26.0TECHNICAL SPECIFICATION126 092 V13.0.0 (2016communications system (Phaelecommunications System (LTE; codec speech processing funulti-Rate (AMR) speech codecomfort noise aspects .092 version 13.0.0 Re

2、lease 1316-01) hase 2+); (UMTS); functions; ec; 13) ETSI ETSI TS 126 092 V13.0.0 (2016-01)13GPP TS 26.092 version 13.0.0 Release 13Reference RTS/TSGS-0426092vd00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 S

3、iret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print.

4、 The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Port

5、able Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal

6、etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechan

7、ical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications

8、Standards Institute 2016. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM log

9、o are Trade Marks registered and owned by the GSM Association. ETSI ETSI TS 126 092 V13.0.0 (2016-01)23GPP TS 26.092 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to the

10、se essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest

11、 updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web

12、 server) which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS ide

13、ntities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “sha

14、ll not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direc

15、t citation. ETSI ETSI TS 126 092 V13.0.0 (2016-01)33GPP TS 26.092 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 References 5g33 Definitions, symbols and abbreviations . 6g33.1 Definitions 6g33.2 Symbols 6g33.3 Ab

16、breviations . 6g34 General . 7g35 Functions on the transmit (TX) side . 7g35.1 LSF evaluation . 7g35.2 Frame energy calculation . 8g35.3 Modification of the speech encoding algorithm during SID frame generation 8g35.4 SID-frame encoding . 9g36 Functions on the receive (RX) side 9g36.1 Averaging and

17、decoding of the LP and energy parameters 9g36.2 Comfort noise generation and updating 10g37 Computational details and bit allocation 11g3Annex A (informative): Change history . 12g3History 13g3ETSI ETSI TS 126 092 V13.0.0 (2016-01)43GPP TS 26.092 version 13.0.0 Release 13Foreword This Technical Spec

18、ification has been produced by the 3rdGeneration Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG

19、with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented f

20、or all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the document. ETSI ETSI TS 126 092 V13.0.0 (2016-01)53GPP TS 26.092 version 13.0.0 Release 131 Scope The present document gives

21、the detailed requirements for the correct operation of the background acoustic noise evaluation, noise parameter encoding/decoding and comfort noise generation for the AMR speech codec during Source Controlled Rate (SCR) operation. The requirements described in the present document are mandatory for

22、 implementation in all UEs capable of supporting the AMR speech codec. The receiver requirements are mandatory for implementation in all networks capable of supporting the AMR speech codec, the transmitter requirements only for those where downlink SCR will be used. In case of discrepancy between th

23、e requirements described in the present document and the fixed point computational description of these requirements contained in 1, the description in 1 will prevail. 2 References The following documents contain provisions which, through reference in this text, constitute provisions of the present

24、document. References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (in

25、cluding a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. 1 3GPP TS 26.073: “Adaptive Multi-Rate (AMR); ANSI C source code“. 2 3GPP TS 26.090 : “Transcoding functions“. 3 3GPP TS 26.091: “Mandatory Speech C

26、odec speech processing functions; AMR Speech Codec; Error concealment of lost frames“. 4 3GPP TS 26.093: “Source Controlled Rate operation “. 5 3GPP TS 26.101: “Frame Structure“. ETSI ETSI TS 126 092 V13.0.0 (2016-01)63GPP TS 26.092 version 13.0.0 Release 133 Definitions, symbols and abbreviations 3

27、1 Definitions For the purpose of the present document, the following terms and definitions apply. Frame: time interval of 20 ms corresponding to the time segmentation of the adaptive multi-rate speech transcoder, also used as a short term for traffic frame. SID frames: special Comfort Noise frames.

28、 It may convey information on the acoustic background noise or inform the decoder that it should start generating background noise. Speech frame: traffic frame that cannot be classified as a SID frame. VAD flag: voice Activity Detection flag. TX_TYPE: one of SPEECH, SID_FIRST, SID_UPD, NO_DATA (defi

29、ned in TS 26.093 4). RX_TYPE: classification of the received traffic frame (defined in TS 26.093 4). Other definitions of terms used in the present document can be found in TS 26.090 2 and TS 26.093 4. The overall operation of SCR is described in TS 26.093 4. 3.2 Symbols For the purpose of the prese

30、nt document , the following symbols apply. Boldface symbols are used for vector variables. Averaged logarithmic frame energy Averaged LSF parameter vector Computed LSF parameter prediction residual Logarithmic frame energy Quantized LSF parameter prediction residual Quantized LSF vector Quantized LS

31、F vector of frame m Reference vector for LSF quantization Unquantized LSF vector Unquantized LSF vector of frame m 3.3 Abbreviations For the purpose of the present document , the following abbreviations apply. AMR Adaptive Multi-Rate SCR Source Controlled Rate operation ( aka source discontinuous tr

32、ansmission ) UE User Equipment SID SIlence DescriptorLP Linear Prediction LSP Line Spectral Pair LSF Line Spectral Frequency RX Receive xnnab()=() ( ) ( ) ()=+xa xa xb xb11Kenmeanlogfmeaneenlog$e$ $.$fTff f=12 10$f()m$freffTff f=12 10.f()mETSI ETSI TS 126 092 V13.0.0 (2016-01)73GPP TS 26.092 version

33、 13.0.0 Release 13TX Transmit VAD Voice Activity Detector 4 General A basic problem when using SCR is that the background acoustic noise, which is transmitted together with the speech, would disappear when the transmission is cut, resulting in discontinuities of the background noise. Since the SCR s

34、witching can take place rapidly, it has been found that this effect can be very annoying for the listener - especially in a car environment with high background noise levels. In bad cases, the speech may be hardly intelligible. This document specifies the way to overcome this problem by generating o

35、n the receive (RX) side synthetic noise similar to the transmit (TX) side background noise. The comfort noise parameters are estimated on the TX side and transmitted to the RX side at a regular rate when speech is not present. This allows the comfort noise to adapt to the changes of the noise on the

36、 TX side. 5 Functions on the transmit (TX) side The comfort noise evaluation algorithm uses the following parameters of the AMR speech encoder, defined in 2: - the unquantized Linear Prediction (LP) parameters, using the Line Spectral Pair (LSP) representation, where the unquantized Line Spectral Fr

37、equency (LSF) vector is given by ; - the unquantized LSF vector for the 12.2 kbit/s mode is given by the second set of LSF parameters in the frame. The algorithm computes the following parameters to assist in comfort noise generation: - the averaged LSF parameter vector (average of the LSF parameter

38、s of the eight most recent frames); - the averaged logarithmic frame energy (average of the logarithmic energy of the eight most recent frames). These parameters give information on the level ( ) and the spectrum ( ) of the background noise. The evaluated comfort noise parameters ( and ) are encoded

39、 into a special frame, called a Silence Descriptor (SID) frame for transmission to the RX side. A hangover logic is used to enhance the quality of the silence descriptor frames. A hangover of seven frames is added to the VAD flag so that the coder waits with the switch from active to inactive mode f

40、or a period of seven frames, during that time the decoder can compute a silence descriptor frame from the quantized LSFs and the logarithmic frame energy of the decoded speech signal. Therefore, no comfort noise description is transmitted in the first SID frame after active speech. If the background

41、 noise contains transients which will cause the coder to switch to active mode and then back to inactive mode in a very short time period, no hangover is used. Instead the previously used comfort noise frames are used for comfort noise generation. The first SID frame also serves to initiate the comf

42、ort noise generation on the receive side, as a first SID frame is always sent at the end of a speech burst, i.e., before the transmission is terminated. The scheduling of SID or speech frames on the network path is described in 4. 5.1 LSF evaluation The comfort noise parameters to be encoded into a

43、SID frame are calculated over consecutive frames marked with VAD=0, as follows: The averaged LSF parameter vector of the frame i shall be computed according to the equation: fTff f=12 10.fmeanenmeanlogenmeanlogfmeanfmeanenmeanlogN = 8()fmeaniETSI ETSI TS 126 092 V13.0.0 (2016-01)83GPP TS 26.092 vers

44、ion 13.0.0 Release 13(1) where is the (unquantized) LSF parameter vector of the current frame i ( ) and past frames (). The averaged LSF parameter vector of the frame i is encoded using the same encoding tables that are also used by the 7.4 kbit/s mode for the encoding of the non-averaged LSF parame

45、ter vectors in ordinary speech encoding mode, but the quantization algorithm is modified in order to support the quantization of comfort noise. The LSF parameter prediction residual to be quantized for frame i is obtained according to the following equation: (2)where is a reference vector picked fro

46、m a codebook. The vector used in eq (2) is encoded for each SID frame. A lookup table containing 8 vectors typical for background noise are searched. The vector which yields the lowest prediction residual energy is selected. After the above step the LSF parameter encoding procedure is performed. The

47、 3-bit index for the reference vector and the 26 bits for LSF parameter are transmitted in the SID frame (see bit allocation in table 1). 5.2 Frame energy calculation The frame energy is computed for each frame marked with VAD=0 according to the equation : (3)where ()sn is the HP-filtered input spee

48、ch signal of the current frame i. The averaged logarithmic energy is computed by: .(4) The averaged logarithmic energy is quantized means of a 6 bit algorithmic quantizer. The 6 bits for the energy index are transmitted in the SID frame (see bit allocation in table 1). 5.3 Modification of the speech

49、 encoding algorithm during SID frame generation When the TX_TYPE is not equal to SPEECH the speech encoding algorithm is modified in the following way: - The non-averaged LP parameters which are used to derive the filter coefficients of the filters and of the speech encoder are not quantized; - The open loop pitch lag search is performed, but the closed loop pitch lag search is inactivated. The adaptive codebook gain and memory is set to zero. - No fixed codebook search is made. - The memory of weighting filter is set to zero, i.

展开阅读全文