1、 INTERNATIONAL TELECOMMUNICATION UNION ITU-T G.722.2TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (07/2003) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS Digital terminal equipments Coding of analogue signals by methods other than PCM Wideband coding of speech at around 16
2、 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) ITU-T Recommendation G.722.2 ITU-T G-SERIES RECOMMENDATIONS TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS INTERNATIONAL TELEPHONE CONNECTIONS AND CIRCUITS G.100G.199 GENERAL CHARACTERISTICS COMMON TO ALL ANALOGUE CARRIER-TRANSMISSION
3、 SYSTEMS G.200G.299 INDIVIDUAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE SYSTEMS ON METALLIC LINES G.300G.399 GENERAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE SYSTEMS ON RADIO-RELAY OR SATELLITE LINKS AND INTERCONNECTION WITH METALLIC LINES G.400G.449 COORDINATION OF RADIOTELEPHON
4、Y AND LINE TELEPHONY G.450G.499 TESTING EQUIPMENTS G.500G.599 TRANSMISSION MEDIA CHARACTERISTICS G.600G.699 DIGITAL TERMINAL EQUIPMENTS G.700G.799 General G.700G.709 Coding of analogue signals by pulse code modulation G.710G.719 Coding of analogue signals by methods other than PCM G.720G.729 Princip
5、al characteristics of primary multiplex equipment G.730G.739 Principal characteristics of second order multiplex equipment G.740G.749 Principal characteristics of higher order multiplex equipment G.750G.759 Principal characteristics of transcoder and digital multiplication equipment G.760G.769 Opera
6、tions, administration and maintenance features of transmission equipment G.770G.779 Principal characteristics of multiplexing equipment for the synchronous digital hierarchy G.780G.789 Other terminal equipment G.790G.799 DIGITAL NETWORKS G.800G.899 DIGITAL SECTIONS AND DIGITAL LINE SYSTEM G.900G.999
7、 QUALITY OF SERVICE AND PERFORMANCE GENERIC AND USER-RELATED ASPECTS G.1000G.1999 TRANSMISSION MEDIA CHARACTERISTICS G.6000G.6999 DIGITAL TERMINAL EQUIPMENTS G.7000G.7999 DIGITAL NETWORKS G.8000G.8999 For further details, please refer to the list of ITU-T Recommendations. ITU-T Rec. G.722.2 (07/2003
8、) i ITU-T Recommendation G.722.2 Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) Summary This Recommendation describes the high quality Adaptive Multi-Rate Wideband (AMR-WB) encoder and decoder that is primarily intended for 7 kHz bandwidth speech signals. A
9、MR-WB operates at a multitude of bit rates ranging from 6.6 kbit/s to 23.85 kbit/s. The bit rate may be changed at any 20-ms frame boundary. Annex C includes an integrated C source code software package which contains the implementation of the G.722.2 encoder and decoder and its Annexes A and B and
10、Appendix I. A set of digital test vectors for developers is provided in Annex D. These test vectors are a verification tool providing an indication of success in implementing this codec. G.722.2 AMR-WB is the same codec as the 3GPP AMR-WB. The corresponding 3GPP specifications are TS 26.190 for the
11、speech codec and TS 26.194 for the Voice Activity Detector. Source ITU-T Recommendation G.722.2 was approved on 29 July 2003 by ITU-T Study Group 16 (2001-2004) under the ITU-T Recommendation A.8 procedure. ii ITU-T Rec. G.722.2 (07/2003) FOREWORD The International Telecommunication Union (ITU) is t
12、he United Nations specialized agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing
13、telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by
14、the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate bot
15、h a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure e.g. interoperability or applicability) and compliance with the Recommendation is achieved when al
16、l of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPERTY RI
17、GHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by I
18、TU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementors are cautioned that this may
19、 not represent the latest information and are therefore strongly urged to consult the TSB patent database. ITU 2004 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. ITU-T Rec. G.722.2 (07/2003) iii CONTENTS Page
20、 1 Scope 1 2 Normative references 2 3 Definitions, symbols and abbreviations 2 3.1 Definitions 2 3.2 Symbols 3 3.3 Abbreviations . 7 4 Outline description 8 4.1 Functional description of audio parts . 8 4.2 Preparation of speech samples 8 4.3 Principles of the adaptive multi-rate wideband speech enc
21、oder 8 4.4 Principles of the adaptive multi-rate speech decoder . 13 4.5 Sequence and subjective importance of encoded parameters. 14 5 Functional description of the encoder. 14 5.1 Preprocessing 14 5.2 Linear prediction analysis and quantization. 15 5.3 Perceptual weighting 20 5.4 Open-loop pitch a
22、nalysis 20 5.5 Impulse response computation . 22 5.6 Target signal computation 22 5.7 Adaptive codebook. 23 5.8 Algebraic codebook 25 5.9 Quantization of the adaptive and fixed codebook gains. 36 5.10 Memory update. 37 5.11 High-band gain generation . 37 6 Functional description of the decoder. 37 6
23、.1 Decoding and speech synthesis 38 6.2 High-pass filtering, up-scaling and interpolation . 40 6.3 High frequency band 41 7 Detailed bit allocation of the adaptive multi-rate wideband codec 43 8 Homing sequences 51 8.1 Functional description 51 8.2 Definitions 52 8.3 Encoder homing 52 8.4 Decoder ho
24、ming . 52 iv ITU-T Rec. G.722.2 (07/2003) Page 9 Voice Activity Detector (VAD) 53 9.1 VAD symbols . 53 9.2 Functional description 55 10 Mandatory AMR-WB codec modes for the speech telephony service in 3GPP 62 BIBLIOGRAPHY 64 ITU-T Rec. G.722.2 (07/2003) 1 ITU-T Recommendation G.722.2 Wideband coding
25、 of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) 1 Scope This Recommendation describes the detailed mapping from input blocks of 320 speech samples in 16-bit uniform PCM format to encoded blocks of 132, 177, 253, 285, 317, 365, 397, 461 and 477 bits and from encoded blocks
26、of 132, 177, 253, 285, 317, 365, 397, 461 and 477 bits to output blocks of 320 reconstructed speech samples. The sampling rate is 16 000 samples/s leading to a bit rate for the encoded bit stream of 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s. The coding scheme for the multi
27、-rate coding modes is the so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP. The multi-rate wideband ACELP coder is referred to as AMR-WB. The codec described in this Recommendation also utilizes an integrated Voice Activity Detector (VAD). The foreseen applica
28、tions for this Recommendation are the following: Voice over IP (VoIP) and Internet applications, Mobile Communications, PSTN applications, ISDN wideband telephony, ISDN videotelephony and video-conferencing. In addition to the algorithm specified in the main body of this Recommendation, Annexes A an
29、d B and Appendix I provide supplemental functionalities allowing interoperability with GSM and 3GPP wireless systems. These functionalities have originally been developed for these systems, but their use is not limited to mobile applications. Annexes D and E describe test vectors and frame structure
30、 respectively. These annexes may be implemented independently of this main body specification according to the different requirements of systems deploying the AMR-WB algorithm: Annex A describes comfort noise aspects for use of the AMR-WB algorithm in source-controlled rate operation. The implementa
31、tion of Annex A is essential for interoperability with GSM and 3GPP wireless systems. Annex B describes the source-controlled rate operation for the AMR-WB algorithm. The implementation of Annex B is essential for interoperability with GSM and 3GPP wireless systems. Annex D describes the digital tes
32、t sequences, which are a verification tool providing an indication of success in implementing the AMR-WB codec. Annex E describes the recommended frame structure for use with the different modes of operation for the AMR-WB algorithm. Appendix I describes an example solution for error concealment of
33、erroneous or lost AMR-WB frames. For better usability, the ANSI-C code with the low-level description of all these functionalities have been grouped into a single annex, Annex C. Should there be any discrepancy between the descriptions in any of the different parts of this Recommendation and the imp
34、lementation of such descriptions in Annex C, the descriptions in Annex C shall prevail. In clause 8 a specific reset procedure, called codec homing, is described. This is a useful feature for bringing the codec into a known initial state (e.g., for testing purposes). Clause 9 specifies the Voice Act
35、ivity Detector (VAD) used in this codec as well as in the source controlled rate operation (DTX) specified in Annex B. Clause 10 provides information on minimum requirements for support of AMR-WB modes in the 3GPP speech telephony service. 2 ITU-T Rec. G.722.2 (07/2003) 2 Normative references The fo
36、llowing ITU-T Recommendations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; users of this Re
37、commendation are therefore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does n
38、ot give it, as a stand-alone document, the status of a Recommendation. 1 ITU-T Recommendation G.722 (1988), 7 kHz audio-coding within 64 kbit/s. 2 RFC 3267 (2002), Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wide
39、band (AMR-WB) Audio Codecs. 3 Definitions, symbols and abbreviations 3.1 Definitions This Recommendation defines the following terms: 3.1.1 adaptive codebook: The adaptive codebook contains excitation vectors that are adapted for every subframe. The adaptive codebook is derived from the long-term fi
40、lter state. The lag value can be viewed as an index into the adaptive codebook. 3.1.2 algebraic codebook: A fixed codebook where algebraic code is used to populate the excitation vectors (innovation vectors). The excitation contains a small number of non-zero pulses with predefined interlaced sets o
41、f potential positions. The amplitudes and positions of the pulses of the kth excitation codevector can be derived from its index k through a rule requiring no or minimal physical storage, in contrast with stochastic codebooks, whereby the path from the index to the associated codevector involves loo
42、k-up tables. 3.1.3 anti-sparseness processing: An adaptive post-processing procedure applied to the fixed codebook vector in order to reduce perceptual artifacts from a sparse fixed codebook vector. 3.1.4 closed-loop pitch analysis: This is the adaptive codebook search, i.e. a process of estimating
43、the pitch (lag) value from the weighted input speech and the long-term filter state. In the closed-loop search, the lag is searched using error minimization loop (analysis-by-synthesis). In the adaptive multi-rate wideband codec, closed-loop pitch search is performed for every subframe. 3.1.5 direct
44、 form coefficients: One of the formats for storing the short-term filter parameters. In the adaptive multi-rate wideband codec, all filters which are used to modify speech samples use direct form coefficients. 3.1.6 fixed codebook: The fixed codebook contains excitation vectors for speech synthesis
45、filters. The contents of the codebook are non-adaptive (i.e., fixed). In the adaptive multi-rate wideband codec, the fixed codebook is implemented using an algebraic codebook. 3.1.7 fractional lags: A set of lag values having subsample resolution. In the adaptive multirate wideband codec a sub-sampl
46、e resolution of 1/4th or 1/2nd of a sample is used. 3.1.8 frame: A time interval equal to 20 ms (320 samples at a 16-kHz sampling rate). 3.1.9 immittance spectral frequencies: (see Immittance Spectral Pair) 3.1.10 immittance spectral pair: Transformation of LPC parameters. Immittance Spectral Pairs
47、are obtained by decomposing the inverse filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd symmetry. The Immittance Spectral ITU-T Rec. G.722.2 (07/2003) 3 Pairs (also called Immittance Spectral Frequencies) are the roots of these poly
48、nomials on the z-unit circle. 3.1.11 integer lags: A set of lag values having whole sample resolution. 3.1.12 interpolating filter: An FIR filter used to produce an estimate of subsample resolution samples, given an input sampled with integer sample resolution. In this implementation, the interpolat
49、ing filter has low-pass filter characteristics. Thus the adaptive codebook consists of the low-pass filtered interpolated past excitation. 3.1.13 inverse filter: This filter removes the short-term correlation from the speech signal. The filter models an inverse frequency response of the vocal tract. 3.1.14 lag: The long-term filter delay. This is typically the true pitch period, or its multiple or submultiple. 3.1.15 LP analysis window: For each frame, the short-term filter coefficients are computed using the high-pass filtered speech samples within
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1