1、3404583 0080493 425 . Released: July 1,1993 GSM 06.32 Version 4.0.1 Date: January 1993 Work Item No: Key words: European digital cellular telecommunication system (phase 2); Voice Activity detection ETSI European Telecommunications Standards Institute ETSI Secretariat: F - 06921 Sophia Antipolis Ced
2、ex . France TP. + 33 92 94 42 00 F, + 33 93 5 47 16 Tx. 47 00 40 F This is an unpublished work the copyright in which vests in the European Telecommunications Standards Institute. All rights reserved. The information contained herein is the propetry of ETSI and no part may be reproduced or used exce
3、pt as authorised by contract or other written permission. The copyrighi and the foregoing restriction on reproduction and use extend to all media in which the information may be embodied. 3404583 0080474 361 Page 3 GSM W.32 (4.0.1): January lm contents 0.SCOPE 5 1 .GENERAL . . 5 2 . FUNCTIONAL DESCR
4、IPTION . 5 2.1. OVERVIEW AND PRINCIPLES OF OPERATION 5 2.2. ALGORITHM DESCRIPTION . 6 2.2.1. Adaptive filtering and energy computation 7 2.2.2. ACF averaging . 7 2.2.3. Predictor values COmputatiocl . 8 2.2.4. Spectral COmpriSoci 8 2.2.5. PeriodiCrty detectiOn . 9 10 2.2.7. VAD decision 12 12 2.2.6.
5、 Threshold adapation . 2.2.8. VAD hangover addition 3 . COMPUTATIONAL DETAILS 3.1, ADAPTIVE FILTERING AND ENERGY COMPUTATION 3.2, ACF AVERAGING 3.3. PREDICTOR VALUES COMPUTATION . 3.3.1. Schur recursion to compute reflection coefficients . 3.3.2. Stepup procedure to obtain the aavl O 8 J. . 3.3.3. C
6、omputation of the ravlO E 1. 3.4. SPECTRAL COMPARISON . 3.5. PERIODICITY DETECTION . 3.6. THRESHOLD ADAPTATON . 3.7. VAD DECISION 3.8. VA0 HANGOVER ADDITION 3.9. PERIODICIN UPDATING . .12 . .14 . .15 . .16 . .16 . .17 . .18 . .19 . .20 . .23 .23 . .23 . .ia . . . 24 . -24 . . 25 4 DIGITAL TEST SEQUE
7、NCES . 4.1. TEST CONFIGURATION 4.2. TEST SEQUENCES . ANNEX 1 . SIMPLIFIED BLOCK FILTERING OPERATION . . . 26 ANNU( 2 . DESCRIPTION OF DIGITAL TECT SEQUENCES . A2.1. EST SEQUENCES . . . 27 . . 27 . . 29 A2.2. FILE FORMAT DESCRIPTION . ANNEX 3 VAD PERFORMANCE . . 31 Previous page is blank 3404583 0080
8、495 2TB = Page 5 GSM 06.32 (4.0.1): January 1993 o. SCOPE This technical specificalion specifies the VaCe active detector (VAD) to be used in the Discontinuous Transmiscion (DTX) as described in GSM 06.31. It also specifies the test methods to be used to verrfy that a VAD complies with the technical
9、 specification. The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations or Base Statim systems. 1. GENERAL The function of the VAD is to indicate wheal= each 2oms frame produced by the speech encoder contains speech or not. The output is a binary flag which is used by
10、the lX DTX handler defined in GSM 06.31. The technical specification is organised as follows: Section 2 describes the principles of operation of the VAD. In section 3, the computational details necessary for the fixed point implementation of the VAD aigorrthm are given. This section uses the same nd
11、ation as used for computatiorial details in GSM 06.1 o. The verification of the VAD is based on the use of digital test sequences. Section 4 defines the input and output signals and the test configuration, whereas the detailed description of the test sequences is contained in annex 2. The performanc
12、e of the VAD algorithm is ctiaracterised by the amount of audible C(eech clippcng it ir3rodmes and the percentage activity it indicates. These characteristics for the VAD defined in this technical specification have been established by extensive testing under a wide range of -rating conditions. The
13、results are summarised in annex 3. 2. FUNCTIONAL DESCRIPTION The purpose of this section is to give the reader an understanding of the principles of operaticm of the VAD, whereas the detailed descripion is given in section 3. In case of discrepancy between the two descripiocrs, the detailed descript
14、ion of section 3 shall prevail. In the following subsections of section 2, a Pascal programming type of notation has been used to describe the algorithm. 21. OVERVIEW AND PRINCIPLES OF OPERATION The functim of the VAD is to distinguish between noise with speech present and noise wrPm speech present.
15、 The biggest diificutly foc detecting speech in a mobile environment is the very low cmcWinocse ratios which are often encountered. The accuracy of the VAD is impruved by using fittering to mease the speecWnoise ratio before the decision is made. Foc a mobtle environment, the worst SpeecWnoise ratio
16、s are encountered in moving vehicles : -as been found that the noise is reiativdy stationary for quite long periods in a mobile environment. it s wefore passible to use an speech is indicated whenever the threshoid is exceeded. The noise encountered in mobile environments may be constantly changing
17、in level. The sw *. -T of the noise can also change, and varies greatly over different vehicles. Because of these chaqis .-e VAD threshold and adaptive filter coefficients must be constantiy wed. To give reliable =.“-Y on the Previous page is blank - 3404583 OOOY7b 134 rAdapthm -c fittering and ener
18、gy computation Page 6 GSM 06.32 (4.0.1): Jinwry 1993 I vad VA0 VAD Pvad c hangover -+ addition I decision threshold must be sufcientty above the ndSe kd to avoid noiSe being identified 8s speech but nd so far above it thst low levei parts d speech are identified 8s noise. The threshold and the aapiv
19、e film coefficients are oniy updated when speech is nd present. It iS, of course, poteriCidty dangerous foc a VAD to update these velues on me -is of its own decision. This adaptation thetetore onty OCCUIS when the signai seems statioriary in the frequency domain but does not have the pitch mm inher
20、ent in voiced speech and information tone3. A further mechanism is ILsBc1 to ensure that kw levei noise (which is otm not stationary aver long periods) is nd detected 8s speech. Here, an additionel ked threshold is used. A VAD hangover period is used to eliminate mid-burst clipping of low level spee
21、ch. Hangover is only added to speechbursts which exceed a cectain duretion to avoid extending noise spikes. 22. ALGORITHM DESCRIPTION me block diagram of the VAD aigorithm is shown in Figure 2-1. The individual blocks we described in the following sections. ACF and N are calculated in the speech eri
22、codec. 1 stat I Fia 2 - 1. Functional block d iaararn of the VAI3 The global variables shown in the ock dlagam we described as fdhvs: - ACF are autocorrelation coefficients which are caictdded in the speech encoder defined in GSM 06.10 (section 3.1.4, see atso Arne# 1). The inputs to the speech enco
23、er are 16 bit 2s complement numbers, as described in GSM 06.10, section 4.2.0. - avo and avl are averaged ACF vBCtas. - ravl are mtocorreiated predictor values obtained from avl. 3404583 0080497 070 W Page ? GSM 08.32 (4.0.1): January 1993 - wad are the autocorrelated predictor values Of the adaptiv
24、e filter. - N is the long term predidoc lag value which is obtained every subsegment in the speech coder defined in GSM 06.1 O. - ptch indicates whether me signal hss a ste i = 08 avlni := avOn-framesi ; i = 08 Where n reptesents the current frame, n-1 represents the prwious frame etc. The values of
25、 constants are given in tabie 2-1. Tabie 2-1. Constants and variables for ACF averaging 3q04583 0080498 TO? Page 8 GSM 06.32 (4.0.1): January 1993 22.3. Predictor valu- computation The filter predictor values aav1 are obtained from the autocorrelation values avl according to me equation: where : - -
26、 - R := avl0,avll,avl2,avl3,avl4J,avl5,avl6,avl7J avll,avl0,avll,avl2J,avl3,avl4,avl5J,avl6 lav2 ,avll,avio,avil1,avl2 ,avl,avir,avl I av3,avl2,avl,avl0,avl,av,av,av I avi 4 1 ,avi 3 ,avl2 ,avl 11 ,avi O, av i 1 ,avl2 J ,avl 3 1 I av5,avl4,avl,av2,avl,av0,av,av I 1 avl 6 J , avl 5 , avl 4 , avl 3 J
27、, avl 2 J , avl 1 , avl O , avl 1 1 (avl7I,av1(61 ,avS,avlJ,avi3 ,avi2,avll,avlO I - - and : aavlO :- -1 avl is used in preference to aV0 85 Svo may cornain speech. The autocotrelated predictor values ravl are men obtained: 8-i k-0 ravli := SUM aavlk aavlk+r) 22.4. Sp6ctrai comparison ; i = 08 Th sp
28、ectra represented by the autocorrelated predictor values ravl and the averaged autocorrelation values Wo are compared using the distortion measure dm defined below. This measure is used to produce a Wean value stat every 2om, as given by these equatiuns: m 3404583 0080499 943 m Page 9 GSM Og.32 (4.0
29、.1): January 1993 8 dr := ( ravlOavOO + 2SVn ravli)avOi ) / av i=l d-fference := Idm - laatdm1 laetdm :- dm stat := difference = nthreeh The following opetations are done after the VAD decisicm and when the current LTP lag values (NO N3) are available, this reduces me delay of the VA0 decision. (N-1
30、) = N3 of previous segment.) lagcount := O or j := O to 3 do begin emallag := maximum(Nj,Nj-1) mod minimum(Nj,Nj-1) if minimum(smallag,minimum(Nj,Nj-1)-emallag) thvad 22. VA0 hangover addition VAD hangcer is only added to bursts of speech greater than o( equal to kxstconst Mocks. The Wean vame vad I
31、ndicates the decision d the VAD with hangover included. The values of the constam are given t- 2-5. The hangma AgOrithm is aS fol=: if wad then increment(burstcount) else burstcount := O if burstcount = burstconnt then beq 1 r. hangcount := hangconst; burmtcount := buretconat en2 vad :- wad or (hang
32、count = O) if hangcount = O then decrement(hangcount) Table 2-5. Constam and variables for VAI) hangover addition 3. COMPUTATIONAL DETAILS in the next paragraphs, the detailed descrion of tfie VAD algorithm fotlows aie preceeding high levei description. This detailed descriptkm is divided in nine se
33、ctions related to the Mocks of figure 2-1 (except the last one) in the high level descriphon of the VAD algorithm. 1. Adaptive filtering and energy computation 2. ACF averaging 3. Predictor values mputaon 4. Spectral comparison 5. Periodicrty detection 6, mreswd adaptation 7. VAD decision 8. VAD han
34、gaver addition 9. Periodicfty updating The VAD algorithm takes as input me foilowing variables of the RPE-LTP encoder (see the detailed descripion o the RPE-LTP encoder rec GSM 06.1 O): - L-AC 8z08, autocorrelation function (rec GSM 06.1 414.2.4); - scaiauto, scaling factor to compute the L-ACFOo8 (
35、rec GSM 06.10/4.2.4); 3404583 O080503 1T4 H Page 13 GSM 06.32 (4.0.1): January 1993 - Nc, LTP lag (one foc each subsegment, rec GSM 06.1 CY4.2.11); So four Nc values are needed for the VAD algorithm. The VAD computation can siart as soon as the L_ACF08 and scalauto variables are known. This means th
36、at the VAD computation can take piace 1 NEXT i: Computation of e acf0 and m acf0: e-acf0 = add( 32, (acalvad 1 ); IF ( L-temp 16; 3.2 ACF AVERAGING This seCuon uses the L-ACFOB and the scabad variables to compute the array L-r.3t3 e and L-avl Oq usecl in section 3.3 and 3.4. Computation of the scali
37、ng factor: scal = sub( 10, (ecalvad scal; 1 L-avo i = L-add( L-eacf i, L-temp ) ; I I L-avOi = L-add( L_eacfi+l8, L-avOi ); I L-sacf pt-sacf + i 1 = L-temp; I -avii = L-eavO pt-eav + i I; I L-savO pt-sav + i = -avi; I NEXT i: L-avO i = L-add( L-sacf i+9, L-avi ) ; Update of the array pointera: IF (
38、pt-sacf = 18 ) THEN pt-Bacf = O; ELSE pt-sacf = add( pt-eacf, 9); IF ( pt-sav0 = 27 ) THEN pt-rav0 = O; ELSE pt-sav0 = add( pt-sav0, 9); 3.3. PREDICTOR VAWES COMPVTAflON This section computes the array ravl08 needed for the spectral comparison and the threshold adapation. It uses the L_avl08 compute
39、d in section 3.2, and is divided in the three following sub- sections: - Schur recursion to compute reflection coefficients. - Step up procedure to obtain the aavlO8. - Computation of the ravl Oq. 3.3.1. Schur recurdon to compute reedion costficientr This sub-section is identicai to the one used in
40、the RPE-LTP algorithm. The array vparl a is computed with the array L-mlOS as m input. Schur recurmion with 16 bits arithmetic: IF( L-avlO - O ) THEN 1- FOR i = 1 to 8: I vpari = 0; 1- NEXT i: I EXIT; /continue with section 3.3.2/ temp = norm( L-avlO: :; I= FOR ka0 to 8: I sacfk = ( L-avlk 16; I=* N
41、EXT k: W 3404583 0080507 84T Page 17 GSM 06.32 (4.0.1): January 1993 initialize array Pr1 and K1 for the recursion: I= FOR is1 to 7: I x9-i = eacfi; I = NEXT i: I= FOR i=O to 8: 1 pli = sacfiil; NEXT i: Compute reflection coefficients: I= FOR n=l to 8: I IF( PO O ) THEN vparn = sub( O, vparn ); I IF
42、 ( n =- 8 ) THEN EXIT: /continue with section 3.3.2/ I Schur recursion: I I I Prol = add( POl, mult-r( PlI, vparn ; I I I I= FOR m=l to 8-n: Pm = add( Pm+l, mult-r( K9-m, vparn ) ); K9-m = add( K9-m, mult-r( Pm+l, vparn ) ); )= NEXT m: I= NEXT n: 3.32. Stepup procedure to obtain the aavl08 Initializ
43、ation of the etep-up recureion: L-coefO - 16384 16; / takes the msb / I= I= 3404583 0080508 786 - Page 18 GSM 06-32 (4.0.1): January 1993 Keep the aavll081 on 13 bite for next eection: I FOR i = O to 8: I aavi = -coefi 19; 1 NEXT i: 3.3.3. Computation of the rnrl OS 1.: FOR i= O to 8: 15 L-worki = O
44、; I= FOR k = O to 8-i: I= L-workii = L-add( L-worki, L-mult( aavlk, aavlk+i ) )i I = NEXT k: 1;. NEXT i: IF ( LworkO = O ) THEN normravl =O; ELSE normravl - norm( L-workO ); 15 FOR i= O to 8: I= NEXT i: I= ravli = ( 16; Keep the normravl for uBe in section 3.4 and 3.6. 3.4. SPECTRAL COMPARISON This
45、section computes the varkbie Sta needed for the threshold adaptation. It uses the array L_av008 computed in section 3.2 and the may ravl OS) computed in section 3.3.3. Re-normalize L av01081: IF ( ELSE L-avOO = O ) THEN I FOR i - O to 8: 1 eavOi = 4095; I NEXT i: shift = norm( L-avOO ); = FOR i - O
46、to 8: = savorif = ( L-avOi 16; = NEXT i: Compute partial sum of dm: 3404583 0080509 bL2 = Page 19 GSM 08.32 (4.0.1): January 1993 Coinpute the division of partial sum by savOr01: I? ( L-6- O ) THEN L-temp L-ub( O, L-sump ); LLSL L-temp = L-sump; I? ( L-temp = O ) THEN 1 L-dm = o; I shift = O; LLS E
47、I I I I I I I I l l I I l I I l sav00 = sav00 16; IF ( SaV00 = temp ) THEN I divshift = O; I temp = div( tamp, sav00 ); ELSE I divahift = 1; I temp - sub( temp, aav001 1; I temp = div( temp, sav00 ); IF( divehift = 1 ) THEN L-dm = 32768; ELSE L-dm = O; L-dm = L-add( L-dm, temp) shift; L-dm = L-add(
48、L-dm, ( ravlO c normravl; Compute the difference and save L dm: Evaluation of the stat flaq: IF ( L-temp = 4 ) THEN ptch = 1; ELSE ptch = O; 3404583 00805LO 334 - Page 20 GSM 06.32 (4.0.1): Januaty 1993 3.6. THRESHOLD ADAPTATION This sectim uses the variables egvad, mgvad, -ad0 and m-acK) computed in section 3.1. It also uses the flags stat (see section 3.4) and ptch (see sectim 3.5). It fdiw the flowchart represented on figure 2.2. Some constants, represented by a floating point format, are