1、 STD-ETSI EN 301 7Ob-ENGL 1999 3400855 04834L3 7b4 ETSI EN 301 706 V7.1.1 (1999-12) European Standard (Telecommunications series) Digital cellular telecommunication system (Phase 2+); Comfort noise aspects for Adaptive Multi-Rate (AMR) speech traffic channels (GSM 06.92 version 7.1 .I Release 1998)
2、STD-ETSI EN 303 706-ENGL 1999 = 3400855 0483434 bTO (GSM 06.92 version 7.1.1 Release 1998) 2 ETSI EN 301 706 V7.1.1 (1999-12) Reference DENISMG-1 10692Q7 Keywords Digital cellular telecommunications system, Global System for Mobile communications (GSM) ETSI Postal address F-O6921 Sophia Antipolis Ce
3、dex - FRANCE Ofice address 650 Route des Lucioles - Sophia Antipolis Valbonne - FRANCE Siret N“ 348 623 562 00017 - NAF 742 C Association a but non lucratif enregistree a la Sous-Prefecture de Grasse (06) No 7803/88 Tel.: +33 4 92 94 42 O0 Fax: +33 4 93 65 47 16 Internet secretariatetsi.fr Individua
4、l copies of this ETSI deliverable can be downloaded from http:/www.etsi.org If you find errors in the present document, send your comment to: editoretsi.fr Important notice This ETSI deliverable may be made available in more than one electronic version or in print. In any case of existing or perceiv
5、ed difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Copyright Notification No part may be re
6、produced except as authorized by written permission. The copyright and the foregoing resriction extend to reproduction in all media. O European Telecommunications Standards Institute 1999. AU rights reserved. STD-ETSI EN 301 70b-ENGL 1999 = 3400655 0483415 531 m (GSM 06.92 verslon 7.1.1 Release 1998
7、) 3 ETSI EN 301 706 V7.1.1 (1999-12) Content intellectual Property Rights 4 Foreword . 4 1 2 3 3.1 3.2 3.3 4 5 5.1 5.2 5.3 5.4 6 6.1 6.2 7 Scope 5 References 5 Definitions, symbols and abbreviations . 5 5 6 Abbreviations . 6 General . 7 Functions on the transmit (TX) side 7 LSF evaluation 7 Frame en
8、ergy calculation 8 Modification of the speech encoding algorithm during SID frame generation . 8 SID-fhne encoding . 9 Functions on the receive (RX) side 9 Averaging and decoding ofthe LP and energy parameters 9 Comfort noise generation and updating 10 Computational details and bit allocation 11 Ann
9、ex A (informative): Document change history 12 History 13 STD.ETS1 EN 301 70b-ENGL 1999 M 3400855 04834Lb 473 (GSM 06.92 version 7.1.1 Release 1998) 4 ETSI EN 301 706 V7.1.1 (1999-12) intellectual Property Rights IPRS essential or potentially essential to the present document may have been declared
10、to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in SR O00 3 14: TnteIectuaI Propers. Rights (IPRs); Essential. or potentially Essential. IPRs notified to ETSIin respect of ETSZstandards”, which is available
11、 Abbreviations and acronyms“. GSM 06.73: “Digital cellular telecommunications system (Phase 2+); ANSI-C code for the GSM Adaptive Multi-Rate speech codec“. GSM 06.90: “Digital cellular telecommunications system (Phase 2+); Adaptive Multi-Rate speech transcoding“. GSM 06.91: “Digital cellular telecom
12、munications system (Phase 2+); Substitution and muting of lost frame for Adaptive Multi-Rate speech traffic channels“. GSM 06.93: “Digital cellular telecommunications system (Phase 2+); Discontinuous transmission (DTX) for Adaptive Multi-Rate speech traffic channels“. 3 1 41 51 3 3.1 Def i nit ions
13、, symbols and ab brevi at i on s Definit ions For the purpose of the present document, the following terms and definitions apply. Frame: time interval of 20 ms corresponding to the time segmentation of the adaptive multi-rate speech transcoder, also used as a short term for traffic frame. (GSM 06.92
14、 version 7.1.1 Release 1998) 6 ETSI EN 301 706 V7.1.1 (1999-12) SID frames: special SID (Silence Descriptor) frames. It may convey information on the acoustic background noise or inform the decoder that it should start generating background noise. Speech frame: traffic frame that cannot be classifie
15、d as a SID frame. VAD flag: voice Activity Detection flag. TX-TYPE: one of SPEECH, SIDFIRST, SID-UPD, NO-DATA (defined in GSM 06.93). RX-TYPE: classification of the received traffic frame (defined in GSM 06.93). Other definitions of terms used in the present document can be found in GSM 06.90 3 and
16、GSM 06.93 5. The overall operation of DTX is described in GSM 06.93 5. 3.2 Symbols For the purpose of the present document, the following symbols apply. Boldface symbols are used for vector variables. f = f, f, . A, Unquantized LSF vector i = f, f2 . . . Ao Quantized LSF vector f m) i m) f meun enlo
17、, Logarithmic frame energy enly Averaged logarithmic he energy E ref e e Unquantized LSF vector of he m Quantized LSF vector of he m Averaged LSF parameter vector Reference vector for LSF quantization Computed LSF parameter prediction residual Quantized LSF parameter prediction residual cxc., = x(a)
18、 + x(a + 1) + . . . + x(b - 1) + x(b) n=u 3.3 Abbreviations For the purpose of the present document, the following abbreviations apply. AMR BSS DTX MS SID LP LSP LSF Rx TX VAD Adaptive Multi-Rate Base Station Subsystem Discontinuous Transmission Mobile Station SIlence Descriptor Linear Prediction Li
19、ne Spectral Pair Line Spectral Frequency Receive Transmit Voice Activity Detector For abbreviations not given in this subclause, see GSM 01.04 i. STDmETSI EN 301 70b-ENGL 1999 3400855 0483419 182 (GSM 06.92 version 7.1.1 Release 1998) 7 ETSI EN 301 706 V7.1.1 (1999-12) 4 General A basic problem when
20、 using DTX is that the background acoustic noise, which is transmitted together with the speech, would disappear when the radio transmission is cut, resulting in discontinuities of the background noise. Since the DTX switching can take place rapidly, it has been found that this effect can be very an
21、noying for the listener - especially in a car environment with high background noise levels. In bad cases, the speech may be hardly intelligible. The present document specifies the way to overcome this problem by generating on the receive (U) side synthetic noise similar to the transmit (TX) side ba
22、ckground noise. The comfort noise parameters are estimated on the TX side and transmitted to the RX side before the radio transmission is switched off and at a regular rate afterwards. This allows the comfort noise to adapt to the changes of the noise on the TX side. 5 Functions on the transmit (TX)
23、 side The comfort noise evaluation algorithm uses the following parameters of the AMR speech encoder, defined in GSM 06.90 3: - the unquantized Linear Prediction (LP) parameters, using the Line Spectral Pair (LSP) representation, where the unquantized Line Spectral Frequency (LSF) vector is given by
24、 f = J f2 . . . A, ; the unquantized LSF vector for the 12.2 kbis mode is given by the second set of LSF parameters in the frame. - The algorithm computes the following parameters to assist in comfort noise generation: - the averaged LSF parameter vector f me?n (average of the LSF parameters of the
25、eight most recent frames); - the averaged logarithmic frame energy frames). (average of the logarithmic energy of the eight most recent These parameters give information on the level ( enEr ) and the spectrum ( f ) of the background noise. The evaluated comfort noise parameters ( f Descriptor (SID)
26、frame for transmission to the RX side. A hangover logic is used to enhance the quality of the silence descriptor frames. A hangover of 7 frames is added to the VAD flag so that the coder waits with the switch from active to inactive mode for a period of 7 frames, during that time the decoder can com
27、pute a silence descriptor frame from the quantized LSFs and the logarithmic frame energy of the decoded speech signal. Therefore, no comfort noise description is transmitted in the first SID frame after active speech. If the background noise contains transients which will cause the coder to switch t
28、o active mode and then back to inactive mode in a very short timeperiod, no hangover is used. Instead the previously used comfort noise hes are used for comfori noise generation. The first SID frame also serves to initiate the comfort noise generation on the receive side, as a first SID frame is alw
29、ays sent at the end of a speech burst, i.e., before the radio transmission is terminated. The scheduling of SID or speech frames on the radio path is described in GSM 06.93 SI anden? ) are encoded into a special frame, called a Silence 5. I LSF evaluation The comfort noise parameters to be encoded i
30、nto a SID frame are calculated over N = 8 consecutive frames marked with VAD=O, as follows: The averaged LSF parameter vector f (i) of the frame i shall be computed according to the equation: STD.ETS1 EN 301 ?Ob-ENGL 1999 = 3400855 0483420 9T4 (GSM 06.92 version 7.1.1 Release 1998) a ETSI EN 301 706
31、 V7.1.1 (1999-12) l7 8 “=O f (i) = - Cf(i - n) where f (i - n is the (unquantized) LSF parameter vector of the current frame i (n = o) and past frames (n = 1, ., 7). The averaged LSF parameter vector f used by the 7.4 kbiffs mode for the encoding of the non-averaged LSF parameter vectors in ordinary
32、 speech encoding mode, but the quantization algorithm is modified in order to support the quantization of comfort noise. The LSF parameter prediction residual to be quantized for frame i is obtained according to the following equation: (i) of the frame i is encoded using the same encoding tables tha
33、t are also where i. ref is a reference vector picked from a codebook. The vector iref used in eq (2) is encoded for each SID frame. A lookup table containing 8 vectors typical for background noise are searched. The vector which yields the lowest prediction residual energy is selected. After the abov
34、e step the LSF parameter encoding procedure is performed. The 3-bit index for the reference vector and the 26 bits for LSF parameter are transmitted in the SID fiame (see bit allocation in table 1). 5.2 Frame energy calculation The frame energy is computed for each frame marked with VAD=O according
35、to the equation : where s(n) is the HP-filtered input speech signal of the current fiame i. The averaged logarithmic energy is computed by: The averaged logarihmic energy is quantized means of a 6 bit algorithmic quantizer. The 6 bits for the energy index are transmitted in the SID fiame (see bit al
36、location in table 1). 5.3 Modification of the speech encoding algorithm during SID frame generation When the TX-TYPE is not equal to SPEECH the speech encoding algorithm is modified in the following way: - the non-averaged LP parameters which are used to derive the filter coefficients of the filters
37、 H(z) and w(z) of the speech encoder are not quantized; the open loop pitch lag search is performed, but the closed loop pitch lag search is inactivated. The adaptive codebook gain and memory is set to zero; no fixed codebook search is made; the memory of weighting filter w( Z) is set to zero, i.e.,
38、 the memory of w( Z) is not updated; - - - - - - STD.ETS1 EN 301 706-ENGL 1999 - 3400855 0483421 830 9 (GSM 06.92 version 7.1.1 Release 1998) 9 ETSI EN 301 706 W.l.l (1999-12) - the ordinary LP parameter quantization algorithm is inactive. The averaged LSF parameter vector f calculated each time a n
39、ew SID he is to be sent to the Radio Subsystem. This parameter vector is encoded into the SID he as defiued in subclause 5. i ; the ordinary gain quantization algorithm is inactive; the predictor memories of the ordinary LP parameter quantization and fixed codebook gain quantization algorithms are i
40、nitialized when TX-TYPE is not SPEECH, so that the quantizers start from known initial states when the speech activity begins again. is - - 5.4 SID-frame encoding The encoding of the 35 comfort noise bits in a SID frame is described in GSM 05.03 where the encoding of the first SID frame is also desc
41、ribed. The bit allocation and sequence of the bits from comfort noise encoding is shown in table 1. 6 Functions on the receive (RX) side The situations in which comfort noise shall be generated on the receive side are defined in GSM 06.93 5. Generally speaking, the comfort noise generation is starte
42、d or updated whenever a valid SID frame is received. 6.1 Averaging and decoding of the LP and energy parameters When speech frames are received by the decoder the LP and the energy parameters of the last seven speech frames shall be kept in memory. The decoder counts the number of frames elapsed sin
43、ce the last SID frame was updated and passed to the RSS by the encoder. Based on this count, the decoder determines whether or not there is a hangover period at the end of the speech burst (defined in GSM 06.93). The interpolation factor is also adapted to the SID update rate. As soon as a SID frame
44、 is received comfort noise is generated at the decoder end. The first SID ame parameters are not received but computed Com the parameters stored during the hangover period. If no hangover period is detected, the parameters from the previous SID update are used. The averaging procedure for obtaining
45、the comfort noise parameters for the first SID he is as follows: - when a speech frame is received, the LSF vector is decoded and stored in memory, moreover the logarithmic frame energy of the decoded signal is also stored in memory; the averaged values of the quantized LSF vectors and the averaged
46、logarithmic frame energy of the decoded frames are computed and used for comfort noise generation. The averaged value of the LSF vector for the first SID frame is given by: - i7 where i( i - n), n 0 is the quantized LSF vector of one of the frames of the hangover period and where i(i) =i( i - 1) . T
47、he averaged logarithmic frame energy for the first SID frame is given by: l7 8 n= en7 (i) = - (i - n) where enlog (i - n), n 0 is the logaritmic vector of one of the frames of the hangover period computed for the decoded frames and where (i) = enlog (i - 1) . (GSM 06.92 version 7.1.1 Release 1998) 1
48、0 ETSI EN 301 706 vI.1.1 (1999-12) For ordinary SID frames, the LSF vector and logarithmic frame energy are computed by table lookup. The LSF vector is given by the sum of the decoded reference vector and the decoded LSF prediction residual. During comfort noise generation the spectrum and energy of
49、 the comfort noise is determined by interpolation between old and new SID frames. In order to achieve a comfort noise that is less static in appearance the LSF vector is slightly perturbed for each frame by adding a small component based on parameters vanations computed in the hangover period. The computation of the perturbation is made by computing the mean LSF vector from the matrix f , this mean vector is then subtracted from each of the elements of f forming a new matrix f . For every frame a mean removed LSF vector is r