1、 International Telecommunication Union ITU-T G.723.1TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (05/2006) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS Digital terminal equipments Coding of analogue signals by methods other than PCM Dual rate speech coder for multimedia
2、communications transmitting at 5.3 and 6.3 kbit/s ITU-T Recommendation G.723.1 ITU-T G-SERIES RECOMMENDATIONS TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS INTERNATIONAL TELEPHONE CONNECTIONS AND CIRCUITS G.100G.199 GENERAL CHARACTERISTICS COMMON TO ALL ANALOGUE CARRIER-TRANSMISSION S
3、YSTEMS G.200G.299 INDIVIDUAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE SYSTEMS ON METALLIC LINES G.300G.399 GENERAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE SYSTEMS ON RADIO-RELAY OR SATELLITE LINKS AND INTERCONNECTION WITH METALLIC LINES G.400G.449 COORDINATION OF RADIOTELEPHONY
4、AND LINE TELEPHONY G.450G.499 TRANSMISSION MEDIA CHARACTERISTICS G.600G.699 DIGITAL TERMINAL EQUIPMENTS G.700G.799 General G.700G.709 Coding of analogue signals by pulse code modulation G.710G.719 Coding of analogue signals by methods other than PCM G.720G.729 Principal characteristics of primary mu
5、ltiplex equipment G.730G.739 Principal characteristics of second order multiplex equipment G.740G.749 Principal characteristics of higher order multiplex equipment G.750G.759 Principal characteristics of transcoder and digital multiplication equipment G.760G.769 Operations, administration and mainte
6、nance features of transmission equipment G.770G.779 Principal characteristics of multiplexing equipment for the synchronous digital hierarchy G.780G.789 Other terminal equipment G.790G.799 DIGITAL NETWORKS G.800G.899 DIGITAL SECTIONS AND DIGITAL LINE SYSTEM G.900G.999 QUALITY OF SERVICE AND PERFORMA
7、NCE GENERIC AND USER-RELATED ASPECTS G.1000G.1999 TRANSMISSION MEDIA CHARACTERISTICS G.6000G.6999 DATA OVER TRANSPORT GENERIC ASPECTS G.7000G.7999 ETHERNET OVER TRANSPORT ASPECTS G.8000G.8999 ACCESS NETWORKS G.9000G.9999 For further details, please refer to the list of ITU-T Recommendations. ITU-T R
8、ec. G.723.1 (05/2006) i ITU-T Recommendation G.723.1 Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s Summary This Recommendation specifies a coded representation that can be used for compressing the speech or other audio signal component of multimedia services
9、 at a very low bit rate as part of the overall H.324 family of standards. This coder has two bit rates associated with it: 5.3 and 6.3 kbit/s. The higher bit rate has greater quality. The lower bit rate gives good quality and provides system designers with additional flexibility. Both rates are a ma
10、ndatory part of the encoder and decoder. It is possible to switch between the two rates at any frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also possible. This coder was optimized to represent speech with a high
11、quality at the above rates using a limited amount of complexity. It encodes speech or other audio signals in frames using linear predictive analysis-by-synthesis coding. The excitation signal for the high rate coder is Multipulse Maximum Likelihood Quantization (MP-MLQ) and for the low rate coder is
12、 Algebraic-Code-Excited Linear-Prediction (ACELP). The frame size is 30 ms and there is an additional look ahead of 7.5 ms, resulting in a total algorithmic delay of 37.5 ms. All additional delays in this coder are due to processing delays of the implementation, transmission delays in the communicat
13、ion link and buffering delays of the multiplexing protocol. The description of this Recommendation is made in terms of bit-exact, fixed-point mathematical operations. The ANSI C code indicated in clause 5 constitutes an integral part of this Recommendation and shall take precedence over the mathemat
14、ical descriptions in this text if discrepancies are found. A non-exhaustive set of test sequences which can be used in conjunction with the C code are available in the electronic attachment. This revision integrates the correction of defects identified either in Implementors Guides or in SG 16 meeti
15、ng reports. Source ITU-T Recommendation G.723.1 was approved on 29 May 2006 by ITU-T Study Group 16 (2005-2008) under the ITU-T Recommendation A.8 procedure. ii ITU-T Rec. G.723.1 (05/2006) FOREWORD The International Telecommunication Union (ITU) is the United Nations specialized agency in the field
16、 of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The Worl
17、d Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In
18、some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recogn
19、ized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure e.g. interoperability or applicability) and compliance with the Recommendation is achieved when all of these mandatory provisions are met. The word
20、s “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that
21、the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommendatio
22、n development process. As of the date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementors are cautioned that this may not represent the latest information and are the
23、refore strongly urged to consult the TSB patent database. ITU 2007 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. ITU-T Rec. G.723.1 (05/2006) iii CONTENTS Page 1 Introduction 1 1.1 Scope 1 1.2 Bit rates. 1 1.
24、3 Possible input signals . 1 1.4 Delay. 1 1.5 Speech coder description 1 2 Encoder principles 2 2.1 General description. 2 2.2 Framer. 3 2.3 High pass filter . 4 2.4 LPC analysis. 4 2.5 LSP quantizer . 4 2.6 LSP decoder 6 2.7 LSP interpolation 6 2.8 Formant perceptual weighting filter . 7 2.9 Pitch
25、estimation 7 2.10 Subframe processing 7 2.11 Harmonic noise shaping . 7 2.12 Impulse response calculator . 8 2.13 Zero input response and ringing subtraction 8 2.14 Pitch predictor 9 2.15 High rate excitation (MP-MLQ). 9 2.16 Low rate excitation (ACELP). 10 2.17 Excitation decoder 13 2.18 Decoding o
26、f the pitch information . 13 2.19 Memory update. 14 2.20 Bit allocation 14 2.21 Coder initialization . 15 3 Decoder principles 16 3.1 General description. 16 3.2 LSP decoder 16 3.3 LSP interpolator 16 3.4 Decoding of the pitch information . 16 3.5 Excitation decoder 17 3.6 Pitch postfilter 17 3.7 LP
27、C synthesis filter 18 3.8 Formant postfilter . 19 3.9 Gain scaling unit. 19 iv ITU-T Rec. G.723.1 (05/2006) Page 3.10 Frame interpolation handling . 19 3.11 Decoder initialization . 20 4 Bitstream packing . 21 5 ANSI C code. 22 6 Glossary 23 Annex A Silence compression scheme . 26 A.1 Introduction 2
28、6 A.2 Description of the VAD 27 A.3 General description of the CNG . 29 A.4 Description of the CNG encoder part. 31 A.5 Description of the decoder part 36 A.6 Bit stream packing 39 A.7 Glossary 39 A.8 Bit-exact, fixed-point C source code 40 Annex B Alternative specification based on floating point a
29、rithmetic. 41 B.1 Introduction 41 B.2 Algorithm description. 41 B.3 ANSI C code. 41 Annex C Scalable channel coding scheme for wireless applications . 42 C.1 Introduction 42 C.2 Channel encoder . 43 C.3 Channel decoder . 52 C.4 Fixed point C source code 56 Electronic attachment. ITU-T Rec. G.723.1 (
30、05/2006) 1 ITU-T Recommendation G.723.1 Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s 1 Introduction 1.1 Scope This Recommendation specifies a coded representation that can be used for compressing the speech or other audio signal component of multimedia serv
31、ices at a very low bit rate. In the design of this coder, the principal application considered was very low bit rate visual telephony as part of the overall H.324 family of standards. 1.2 Bit rates This coder has two bit rates associated with it. These are 5.3 and 6.3 kbit/s. The higher bit rate has
32、 greater quality. The lower bit rate gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the encoder and decoder. It is possible to switch between the two rates at any 30-ms frame boundary. An option for variable rate operation using disco
33、ntinuous transmission and noise fill during non-speech intervals is also possible. 1.3 Possible input signals This coder was optimized to represent speech with a high quality at the above rates using a limited amount of complexity. Music and other audio signals are not represented as faithfully as s
34、peech, but can be compressed and decompressed using this coder. 1.4 Delay This coder encodes speech or other audio signals in 30-ms frames. In addition, there is a look ahead of 7.5 ms, resulting in a total algorithmic delay of 37.5 ms. All additional delays in the implementation and operation of th
35、is coder are due to: i) actual time spent processing the data in the encoder and decoder; ii) transmission time on the communication link; iii) additional buffering delay for the multiplexing protocol. 1.5 Speech coder description The description of the speech coding algorithm of this Recommendation
36、 is made in terms of bit-exact, fixed-point mathematical operations. The ANSI C code indicated in clause 5, which constitutes an integral part of this Recommendation, reflects this bit-exact, fixed-point description approach. The mathematical descriptions of the encoder and decoder, given respective
37、ly in clauses 2 and 3, can be implemented in several other fashions, possibly leading to a codec implementation not complying with this Recommendation. Therefore, the algorithm description of the C code of clause 5 shall take precedence over the mathematical descriptions of clauses 2 and 3 whenever
38、discrepancies are found. A non-exhaustive set of test sequences which can be used in conjunction with the C code are available from the ITU. 2 ITU-T Rec. G.723.1 (05/2006) 2 Encoder principles 2.1 General description This coder is designed to operate with a digital signal obtained by first performin
39、g telephone bandwidth filtering (ITU-T Rec. G.712) of the analogue input, then sampling at 8000 Hz and then converting to 16-bit linear PCM for the input to the encoder. The output of the decoder should be converted back to analogue by similar means. Other input/output characteristics, such as those
40、 specified by ITU-T Rec. G.711 for 64 kbit/s PCM data, should be converted to 16-bit linear PCM before encoding or from 16-bit linear PCM to the appropriate format after decoding. The bitstream from the encoder to the decoder is defined within this Recommendation. The coder is based on the principle
41、s of linear prediction analysis-by-synthesis coding and attempts to minimize a perceptually weighted error signal. The encoder operates on blocks (frames) of 240 samples each. That is equal to 30 ms at an 8-kHz sampling rate. Each block is first high pass filtered to remove the DC component and then
42、 divided into four subframes of 60 samples each. For every subframe, a 10th order Linear Prediction Coder (LPC) filter is computed using the unprocessed input signal. The LPC filter for the last subframe is quantized using a Predictive Split Vector Quantizer (PSVQ). The unquantized LPC coefficients
43、are used to construct the short-term perceptual weighting filter, which is used to filter the entire frame and to obtain the perceptually weighted speech signal. For every two subframes (120 samples), the open-loop pitch period, LOL, is computed using the weighted speech signal. This pitch estimatio
44、n is performed on blocks of 120 samples. The pitch period is searched in the range from 18 to 142 samples. From this point the speech is processed on a 60 samples per subframe basis. Using the estimated pitch period computed previously, a harmonic noise shaping filter is constructed. The combination
45、 of the LPC synthesis filter, the formant perceptual weighting filter, and the harmonic noise shaping filter is used to create an impulse response. The impulse response is then used for further computations. Using the pitch period estimation, LOL, and the impulse response, a closed-loop pitch predic
46、tor is computed. A fifth order pitch predictor is used. The pitch period is computed as a small differential value around the open-loop pitch estimate. The contribution of the pitch predictor is then subtracted from the initial target vector. Both the pitch period and the differential value are tran
47、smitted to the decoder. Finally the non-periodic component of the excitation is approximated. For the high bit rate, Multipulse Maximum Likelihood Quantization (MP-MLQ) excitation is used, and for the low bit rate, an algebraic-code-excitation (ACELP) is used. The block diagram of the encoder is sho
48、wn in Figure 1. ITU-T Rec. G.723.1 (05/2006) 3 Figure 1/G.723.1 Block diagram of the speech coder 2.2 Framer File : LBCCODEC.C Procedure : main() Reads 240 samples input frames File : CODER.C Procedure : Coder() Performs subframe division The coder processes the speech by buffering consecutive speec
49、h samples, yn, into frames of 240 samples, sn. Each frame is divided into two parts of 120 samples for pitch estimation computation. Each part is divided by two again, so that each frame is finally divided into four subframes of 60 samples each. 4 ITU-T Rec. G.723.1 (05/2006) 2.3 High pass filter File : UTIL_LBC.C Procedure : Rem_Dc() Performs high pass filter This block removes the DC element from the input speech, sn. The filter transfer function is: ()1112812711=zzzH (1) The output of this filter is: xnn = 0239. 2.4 LPC analysis File : L
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1