1、COVERING NOTE GENERAL SECRETARIAT INTERNATIONAL TELECOMMUNICATION UNION Geneva, 29 July 2003 ITU -TELECOMMUNICATION STANDARDIZATION SECTOR Subject: Amendment 1 (07/2003) to ITU-T Recommendation G.722.2 Appendix I (O 1/2002), Wideband coding of speech at around 16 kbith using Adaptive Multi-Rate Wide
2、band (AMR- WB) - Appendix I: Error concealment of erroneous or lost frames 1) In clause 1.5.2.3.4.2, correct the last but one line as follows (change “second“ into “third“) : TmaP2 is dulargest value in Thfir 2) Add a new clause 1.5.2.5 as follows: 1.5.2.5 High-band gain (for 23.85 kbit/s mode) When
3、 RX FRAMETYPE = SPEECH BAD or RX FRAMETYPE = SPEECH LOST, the received high-band energy parameter of the fi-ame is norused and the estimation for the high-band gain is used instead. This means that in case of badlost speech frames, the high-band reconstruction operates in the same way for all the mo
4、des. Union internationale des tlcommunications Place des Nations 121 1 GENVE 20 Suisse - Switzerland - Suiza INTERNATIONAL TELECOMMUNICATION UNION I ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF TU G.722.2 Appendix I (01/2002) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORK
5、S Digital terminai equipments - Coding of analogue signals by methods other than PCM Wideband coding of speech at around 16 kbitk using Adaptive Multi-Rate Wideband (AMR-WB) Appendix I: Error concealment of erroneous or lost frames ITU-T Recommendation G.722.2 - Appendix I INTERNATIONAL TELECOMMUNIC
6、ATION UNION ITU=T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU G.722.2 Appendix I (O 1 /2002) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS Digital terminal equipments - Coding of analogue signals by methods other than PCM Wideband coding of speech at around 16 kbit/s usi
7、ng Adaptive Multi-Rate Wideband (AMR-WB) Appendix I: Error concealment of erroneous or lost frames ITU-T Recommendation G.722.2 - Appendix I ITU-T G-SERIES RECOMMENDATIONS TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS INTERNATIONAL TELEPHONE CONNECTIONS AND CIRCUITS TRANSMISSION SYSTE
8、MS INDIVIDUAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE SYSTEMS ON METALLIC LINES GENERAL CHARACTERISTICS COMMON TO ALL ANALOGUE CARRIER- GENERAL CHARACTERISTICS OF INTERNATIONAL CARRIER TELEPHONE METALLIC LINES SYSTEMS ON RADIO-RELAY OR SATELLITE LINKS AND INTERCONNECTION WITH COORDINATION
9、 OF RADIOTELEPHONY AND LINE TELEPHONY TESTING EQUIPMENTS TRANSMISSION MEDIA CHARACTERISTICS DIGITAL TERMINAL EQUIPMENTS General Coding of analogue signals by pulse code modulation Coding of analogue signals by methods other than PCM Principal characteristics of primary multiplex equipment Principal
10、characteristics of second order multiplex equipment Principal characteristics of higher order multiplex equipment Principal characteristics of transcoder and digital multiplication equipment Operations, administration and maintenance features of transmission equipment Principal characteristics of mu
11、ltiplexing equipment for the synchronous digital hierarchy Other terminal equipment DIGITAL NETWORKS DIGITAL SECTIONS AND DIGITAL LINE SYSTEM QUALITY OF SERVICE AND PERFORMANCE TRANSMISSION MEDIA CHARACTERISTICS DIGITAL TERMINAL EQUIPMENTS DIGITAL NETWORKS G. 1004.199 G.200-G.299 G.300-G.399 G.400-G
12、.449 G.450-G.499 G.500-G.599 G.600-G.699 G.700-G.799 G.7 1 0-G.7 19 G.720-G.729 G.730-G.739 G.740-G.749 G.750-G.759 G.760-G.769 G.770-G.779 G.780-G.789 G.700-G.709 G.790-G.799 G.800-G.899 G.900-G.999 G. 1000-G. 1999 G.6000-G.6999 G.7000-G.7999 G.80004.8999 For further details, please refer to the li
13、st of ITU-T Recommendations. ITU-T Recommendation G.722.2 Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) Appendix I Error concealment of erroneous or lost frames Summary This appendix specifies a non-normative example solution for concealment of erroneous o
14、r lost frames for the G.722.2 AMR-WB codec. The concealment operations described here were also adopted by 3GPP in 3GPP specification TS 26.191 Source Appendix I to ITU-T Recommendation G.722.2 was prepared by ITU-T Study Group 16 (2001 -2004) and was approved under the WTSa Resolution No. 1 procedu
15、re on 13 January 2002. ITU-T Rec. G.722.21Appendix I (01/2002) i FOREWORD The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is respon
16、sible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the IT
17、U-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborativ
18、e basis with IS0 and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recognized operating agency. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that the practice or implementatio
19、n of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by IT members or others outside of the Recommendation development process. As of t
20、he date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementors are cautioned that this may not represent the latest information and are therefore strongly urged to consu
21、lt the TSB patent database. o ITU 2002 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. 11 ITU-T Rec. G.722.2/Appendix I (01/2002) CONTENTS Page 1.1 Scope 1.2 Definitions and abbreviations . 1.2.1 Definitions 1.
22、2.2 Abbreviations . 1.3 General 1.4 Requirements 1.4.1 Error detection 1.4.2 Erroneous or lost speech frames . 1.4.3 First lost SID frame 1.4.4 Subsequent lost SID frames 1.5 Example ECUBFH Solution 1.5.1 State machine 1.5.2 Substitution and muting of erroneoudlost speech frames 1.5.2.1 BFI = O, pre
23、vBFI = O, State = O or 1 1.5.2.2 BFI = O, prevBFI = 1. State = O to 3 . 1.5.2.3 BFI = 1, prevBFI = O or 1, State = 1 . 6 . 1.5.2.4 Innovation sequence . Substitution and muting of lost SID frames . 1.5.3 2 ITU-T Rec . G.722.2/Appendix I (01/2002) . 111 ITU-T Recommendation G.722.2 Wideband coding of
24、 speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) Appendix I Error concealment of erroneous or lost frames 1.1 Scope This specification defines an example procedure for error concealment, also termed frame substitution and muting procedure, for use by the AMR-WB speech codec re
25、ceiving end when one or more erroneous/lost speech or lost Silence Insertion Descriptor (SID) frames are received. The algorithm specified in this appendix is available as part of the ANSI-C code in Annex UG.722.2. In case of discrepancy between the specification in this appendix and the fixed point
26、 computational description of this algorithm contained in Annex UG.722.2, the description in Annex UG.722.2 will prevail. 1.2 Definitions and abbreviations 1.2.1 Definitions This appendix defines the following term: N-point median operation: Consists of sorting the N elements belonging to the set fo
27、r which the median operation is to be performed in an ascending order according to their values, and selecting the (int (N/2) + 1) th largest value of the sorted set as the median value. 1.2.2 Abbreviations This appendix uses the folowing abbreviations: AMR-WB Adaptive Multi-Rate WideBand AN Access
28、Network BFH Bad Frame Handling BFI BSInetw CRC Cyclic Redundancy Check ECU Error Concealment Unit medid N-point median operation prevBFI Rx Receive SCR Source Controlled Rate (operation) SID Bad Frame Indication from AN Bad Sub-block Indication obtained from AN interface CRC checks Bad Frame Indicat
29、ion of previous frame Silence Insertion Descriptor (Background noise) ITU-T Rec. G.722.2/Appendix I (01/2002) 1 1.3 General The purpose of the error concealment procedure is to conceal the effect of erroneous/lost AMR-WB speech frames. The purpose of muting the output in the case of several erroneou
30、s/lost frames is to indicate the breakdown of the channel to the user and to avoid generating possible annoying sounds as a result from the error concealment procedure. The network shall indicate erroneous/lost speech or lost SID frames by setting the RX TYPE values Annex B/G.722.2 to SPEECH-BAD, SI
31、DBAD or SPEECH - LOST. If these flagsare set, the speech decoder shall perform parameter substitution to conceal errors. The example solution provided in 1.5 apply only to bad frame handling on a complete speech frame basis. Sub-frame based error concealment may be derived using similar methods. 1.4
32、 Requirements 1.4.1 Error detection If the most sensitive bits of the AMR-WB speech data are received in error, the network shall indicate RX TYPE = SPEECH BAD in which case the BFI flag is set. When the frame is not received, the-network shall indicate RX-TYPE = RX-SPEECH-LOST in which case the BFI
33、 flag is set as well. If a SID frame is received in error, the network shall indicate RX-TYPE = SID-BAD. 1.4.2 Normal decoding of erroneous/lost speech frames would result in very unpleasant noise effects. In order to improve the subjective quality, erroneous/lost speech frames shall be substituted
34、with either a repetition or an extrapolation of the previous good speech frame(s). This substitution is done so that it gradually will decrease the output level, resulting in silence at the output. Clause 1.5 provides example solution. Erroneous or lost speech frames 1.4.3 First lost SID frame A los
35、t SID frame shall be substituted by using the SID information fiom earlier received valid SID frames and the procedure for valid SID frames be applied as described in Annex B/G.722.2. 1.4.4 Subsequent lost SID frames For many subsequent lost SID fiames, a muting technique shall be applied to the com
36、fort noise that will gradually decrease the output level. For subsequent lost SID frames, the muting of the output shall be maintained. Clause 1.5 provides example solutions. 1.5 Example ECU/BFH Solution 1.5.1 State machine This example solution for substitution and muting is based on a state machin
37、e with seven states (Figure 1.1). The system starts in state O. Each time a bad frame is detected, the state counter is incremented by one and is saturated when it reaches 6. Each time a good speech frame is detected, the state counter is right-shifted by one. The state indicates the quality of the
38、channel: the larger the value of the state counter, the worse the channel quality is. The control flow of the state machine can be described by the following C code (BFI = bad frame indicator, State = state variable): if(BFI != O ) State = State + I; if(State 6) 2 ITU-T Rec. G.722.2/Appendix I (01/2
39、002) State = 6; State = State 1 ; else In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing depends on the value of the State-variable. In states O and 6, the processing depends on the BFI flag. The state machine is summarized in Figure I
40、. 1. ITU-T Rec. G.722.2/Appendix I (01/2002) 3 STATE = O BFI = O PrevBFI = O or 1 A STATE = 1 FI, prevBFI) = (1,O) or (O, 1) or STATE = 3 (BFI, prevBFI) = (1,l) or (1,O) or STATE = 4 BFI= 1 prevBFI = O or 1 1 STATE = 5 BFI= 1 prevBFI = 1 STATE = 6 BFI= 1 J prevBFI = 1 Tl610080-O2 Bad frame (BFI = 1)
41、 + Good frame (BFI = O) Figure I.UG.722.2 - State machine for controlling the bad frame substitution 1.5.2 Substitution and muting of erroneousflost speech frames 1.5.2.1 BFI = O, prevBFI = O, State = O or 1 No error is detected in the received or in the previous received speech frame. The received
42、speech parameters are used normally in the speech synthesis. The current frame of speech parameters is saved. 4 ITU-T Rec. G.722.2/Appendix I (01/2002) 1.5.2.2 BFI = O, prevBFI = 1, State = O to 3 No error is detected in the received speech frame but the previous received speech frame was bad. The L
43、TP gain is used normally in the speech synthesis and fixed codebook gain are limited below the values used for the last received good subframe: where: current decoded fixed codebook-gain C - greceived - gc (n - 1) = fixed codebook gain used for the last good subframe (BFI = O) g“(n) = fixed codebook
44、 gain to be used for the current frame The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech parameters is saved. 1.5.2.3 BFI = 1, prevBFI = O or 1, State = 1.6 An error is detected in the received speech frame and the substitution and muti
45、ng procedure is started. 1.5.2.3.1 LTP gain & fixed codebook gain concealment when RX - FRAMETYPE = SPEECH-BAD The LTP gain gp and fixed codebook gain g“ are replaced by attenuated values from the previous sub frames : P“(state) * median5(gc(n - i), ., gC(n - 5) , VADHIST I 2 , VAD HIST 2 1-3) media
46、n5(gc(n- i), ., g“(n - 5) - gc = where: gP = current decoded LTP gain g“ = current decoded fixed codebook gain gP(n - i), ., gP(n - 5) = LTP gains used for the last 5 subframes gC(n - i), .,g“( n - 5) = fixed codebook gains used for the last 5 subframes median50 = 5-point median operation Pp(state)
47、= attenuation factor (P(1) = 0.98, PP (2) = 0.96, PP (3) = 0.75, PP (4) = 0.23, Pp (5) = 0.05, Pp(6) = 0.01) P“(state) = attenuation factor (P“ (1) = 0.98, P“ (2) = 0.98, P“ (3) = 0.98, P“ (4) = 0.98, P“ (5) = 0.98, P“ (6) = 0.70) state = state number 06 ITU-T Rec. G.722.21Appendix I (01/2002) 5 VAD
48、HIST is number of consecutive VAD = O decisions The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain is updated by using the average value of the past four values in the memory: ener(0) = - 1 C ener(n - i) - 3 4 i:, 1.5.2.3.2 LTP gai
49、n & fixed codebook gain concealment when RX-FRAMETYPE = SPEECH-LOST The LTP gain g subfiames: gc = where: (1-4) and fixed codebook gain g“ are replaced by attenuated values from the previous P“(state) * median5(gc(n - i), ., gc(n - 5) median5(gc(n-1), .,gC( n- 5) , VAD-HIST I 2 ,VAD - HIST2 1-6) gP = current decoded LTP gain, gc = current decoded fixed codebook gain gP(n-i), .,gP( n - 5) = LTP gains used for the last 5 subframes g“(n - i), .,g“( n - 5) = fixed codebook gains used for the last 5 subframes median5() = 5-point median operation Pp(state) = attenuation factor (P