ETSI TS 126 090-2017 Digital cellular telecommunications system (Phase 2+) (GSM) Universal Mobile Telecommunications System (UMTS) LTE Mandatory Speech Codec speech processing func.pdf

资源描述

1、 ETSI TS 126 090 V14.0.0 (2017-04) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions (3GPP TS 26.090 version 14.0.0 Rel

2、ease 14) TECHNICAL SPECIFICATION ETSI ETSI TS 126 090 V14.0.0 (2017-04)13GPP TS 26.090 version 14.0.0 Release 14Reference RTS/TSGS-0426090ve00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562

3、00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any

4、 electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Forma

5、t (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.org/TB/ETSID

6、eliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, inc

7、luding photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards

8、 Institute 2017. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Tra

9、de Marks registered and owned by the GSM Association. ETSI ETSI TS 126 090 V14.0.0 (2017-04)23GPP TS 26.090 version 14.0.0 Release 14Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essent

10、ial IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates

11、are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server)

12、which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities o

13、r GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“,

14、“should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citatio

15、n. ETSI ETSI TS 126 090 V14.0.0 (2017-04)33GPP TS 26.090 version 14.0.0 Release 14Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 5g31 Scope 6g32 References 6g33 Definitions, symbols and abbreviations . 6g33.1 Definitions 6g33.2 Symbols 8g33.3 Abbreviatio

16、ns . 11g34 Outline description . 12g34.1 Functional description of audio parts . 12g34.2 Preparation of speech samples 13g34.2.1 PCM format conversion 13g34.3 Principles of the adaptive multi-rate speech encoder . 13g34.4 Principles of the adaptive multi-rate speech decoder . 15g34.5 Sequence and su

17、bjective importance of encoded parameters . 16g35 Functional description of the encoder 16g35.1 Pre-processing (all modes) . 16g35.2 Linear prediction analysis and quantization . 16g35.2.1 Windowing and auto-correlation computation 17g35.2.2 Levinson-Durbin algorithm (all modes) . 18g35.2.3 LP to LS

18、P conversion (all modes) 18g35.2.4 LSP to LP conversion (all modes) 20g35.2.5 Quantization of the LSP coefficients 20g35.2.6 Interpolation of the LSPs 22g35.2.7 Monitoring resonance in the LPC spectrum (all modes). 22g35.3 Open-loop pitch analysis 23g35.4 Impulse response computation (all modes) 26g

19、35.5 Target signal computation (all modes) . 26g35.6 Adaptive codebook . 27g35.6.1 Adaptive codebook search 27g35.6.2 Adaptive codebook gain control (all modes) 30g35.7 Algebraic codebook 31g35.7.1 Algebraic codebook structure . 31g35.7.2 Algebraic codebook search . 33g35.8 Quantization of the adapt

20、ive and fixed codebook gains . 37g35.8.1 Adaptive codebook gain limitation in quantization 37g35.8.2 Quantization of codebook gains 37g35.8.3 Update past quantized adaptive codebook gain buffer (all modes). 39g35.9 Memory update (all modes) 39g36 Functional description of the decoder 40g36.1 Decodin

21、g and speech synthesis 40g36.2 Post-processing . 43g36.2.1 Adaptive post-filtering (all modes) . 43g36.2.2 High-pass filtering and up-scaling (all modes) . 44g37 Detailed bit allocation of the adaptive multi-rate codec . 45g38 Homing sequences 49g38.1 Functional description 49g38.2 Definitions 50g38

22、.3 Encoder homing . 50g3ETSI ETSI TS 126 090 V14.0.0 (2017-04)43GPP TS 26.090 version 14.0.0 Release 148.4 Decoder homing . 50g39 Bibliography . 54g3Annex A (informative): Change history . 55g3History 56g3ETSI ETSI TS 126 090 V14.0.0 (2017-04)53GPP TS 26.090 version 14.0.0 Release 14Foreword This Te

23、chnical Specification has been produced by the 3rdGeneration Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released

24、 by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is i

25、ncremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the document. ETSI ETSI TS 126 090 V14.0.0 (2017-04)63GPP TS 26.090 version 14.0.0 Release 141 Scope The present doc

26、ument describes the detailed mapping from input blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits and from encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits to output blocks of 160 reconstructed speech samples

27、. The sampling rate is 8 000 samples/s leading to a bit rate for the encoded bit stream of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s. The coding scheme for the multi-rate coding modes is the so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP. The m

28、ulti-rate ACELP coder is referred to as MR-ACELP. In the case of discrepancy between the requirements described in the present document and the fixed point computational description (ANSI-C code) of these requirements contained in 4, the description in 4 will prevail. The ANSI-C code is not describe

29、d in the present document, see 4 for a description of the ANSI-C code. The transcoding procedure specified in the present document is mandatory for systems using the AMR speech codec. 2 References The following documents contain provisions which, through reference in this text, constitute provisions

30、 of the present document. References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest version applies. In the case of a reference to a 3

31、GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. 1 GSM 03.50: “ Digital cellular telecommunications system (Phase 2+); Transmission planning aspects of the speech service in the GSM

32、 Public Land Mobile Network (PLMN) system“. 2 3GPP TS 26.101 : “Frame Structure“. 3 3GPP TS 26.094: “AMR Speech Codec; Voice Activity Detector“. 4 3GPP TS 26.073: “Adaptive Multi-Rate (AMR); ANSI C source code“. 5 3GPP TS 26.074: “Adaptive Multi-Rate (AMR); Test sequences“. 6 ITU-T Recommendation G.

33、711 (1988): “Pulse code modulation (PCM) of voice frequencies“. 7 ITU-T Recommendation G.726: “40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)“. 8 ITU-T Recommendation G.712 3 Definitions, symbols and abbreviations 3.1 Definitions For the purposes of the present document, t

34、he following terms and definitions apply: adaptive codebook: contains excitation vectors that are adapted for every subframe. The adaptive codebook is derived from the long-term filter state. The lag value can be viewed as an index into the adaptive codebook ETSI ETSI TS 126 090 V14.0.0 (2017-04)73G

35、PP TS 26.090 version 14.0.0 Release 14adaptive postfilter: this filter is applied to the output of the short-term synthesis filter to enhance the perceptual quality of the reconstructed speech. In the adaptive multi-rate codec, the adaptive postfilter is a cascade of two filters: a formant postfilte

36、r and a tilt compensation filter algebraic codebook: fixed codebook where algebraic code is used to populate the excitation vectors (innovation vectors). The excitation contains a small number of nonzero pulses with predefined interlaced sets of positions anti-sparseness processing: adaptive post-pr

37、ocessing procedure applied to the fixed codebook vector in order to reduce perceptual artefacts from a sparse fixed codebook vector closed-loop pitch analysis: adaptive codebook search, i.e., a process of estimating the pitch (lag) value from the weighted input speech and the long term filter state.

38、 In the closed-loop search, the lag is searched using error minimization loop (analysis-by-synthesis). In the adaptive multi-rate codec, closed-loop pitch search is performed for every subframe direct form coefficients: One of the formats for storing the short term filter parameters. In the adaptive

39、 multi-rate codec, all filters which are used to modify speech samples use direct form coefficients. fixed codebook: The fixed codebook contains excitation vectors for speech synthesis filters. The contents of the codebook are non-adaptive (i.e., fixed). In the adaptive multi-rate codec, the fixed c

40、odebook is implemented using an algebraic codebook. fractional lags: A set of lag values having sub-sample resolution. In the adaptive multi-rate codec a sub-sample resolution of 1/6thor 1/3rdof a sample is used. frame: time interval equal to 20 ms (160 samples at an 8 kHz sampling rate) integer lag

41、s: set of lag values having whole sample resolution interpolating filter: FIR filter used to produce an estimate of subsample resolution samples, given an input sampled with integer sample resolution inverse filter: this filter removes the short term correlation from the speech signal. The filter mo

42、dels an inverse frequency response of the vocal tract lag: long term filter delay. This is typically the true pitch period, or its multiple or sub-multiple Line Spectral Frequencies: (see Line Spectral Pair) Line Spectral Pair: transformation of LPC parameters. Line Spectral Pairs are obtained by de

43、composing the inverse filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd symmetry. The Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of these polynomials on the z-unit circle LP analysis window: for each

44、frame, the short term filter coefficients are computed using the high pass filtered speech samples within the analysis window. In the adaptive multi-rate codec, the length of the analysis window is always 240 samples. For each frame, two asymmetric windows are used to generate two sets of LP coeffic

45、ient in the 12.2 kbit/s mode. For the other modes, only a single asymmetric window is used to generate a single set of LP coefficients. In the 12.2 kbit/s mode, no samples of the future frames are used (no lookahead). The other modes use a 5 ms lookahead LP coefficients: linear Prediction (LP) coeff

46、icients (also referred as Linear Predictive Coding (LPC) coefficients) is a generic descriptive term for the short term filter coefficients mode: when used alone, refers to the source codec mode, i.e., to one of the source codecs employed in the AMR codec open-loop pitch search: process of estimatin

47、g the near optimal lag directly from the weighted speech input. This is done to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags around the open-loop estimated lags. In the adaptive multi-rate codec, an open-loop pitch search is performed in every other

48、subframe residual: the output signal resulting from an inverse filtering operation short term synthesis filter: this filter introduces, into the excitation signal, short term correlation which models the impulse response of the vocal tract ETSI ETSI TS 126 090 V14.0.0 (2017-04)83GPP TS 26.090 versio

49、n 14.0.0 Release 14perceptual weighting filter: this filter is employed in the analysis-by-synthesis search of the codebooks. The filter exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near the formant frequencies and more in regions away from them subframe: time interval equal to 5 ms (40 samples at 8 kHz sampling rate) vector quantization: method of grouping several parameters into a vector and quantizing them simultaneously zero input response: output of a filter due to past i

展开阅读全文