ETSI PRI-ETS 300 040-1991 European Digital Cellular Telecommunications System (Phase 1) Voice Activity Detection《欧洲数字蜂窝通信系统(第1阶段) 话音激活检测》.pdf-资源下载-麦多课文库

ETSI PRI-ETS 300 040-1991 European Digital Cellular Telecommunications System (Phase 1) Voice Activity Detection《欧洲数字蜂窝通信系统(第1阶段) 话音激活检测》.pdf

1、ELECOMMUNICATION STANDARD FINAL DRAFT pri-ETS 300 040 March 1991 UDC: 621.396.21 Key words: Groupe Spcial Mobile (GSM) European digital cellular telecommunications system (phase 1); Voice activity detection (GSM 06.32) ETSI European Telecommunications Standards Institute ETSI Secretariat: B.P.152 .

2、F - 06561 Valbonne Cedex . France TP. + 33 92 94 42 O0 TF. + 33 93 65 47 16 Tx. 47 O0 40 F I - P European Telecommunications Standards Institute 1 991 . All rights reserved. - No part may be reproduced or used except as authorised by contract or other written permission. The copyright and the forego

3、ing restriction on reproduction and use extend to all media in which the information may be embodied. D 3404583 0015835 2 Whilst every care has been taken in the preparation and publication of this document, errors in content, typographical or otherwise, may occur. If you have comments concerning it

4、s accuracy, -1-q-a *irri+s +n “ma Ciandarrfc thnanement Oeot.“ at the address shown on the title page. -=-. - 3404583 0015836 4 Foreword This Interim European Telecommunication Standard (I-ETS) has been produced by the Special Mobile Group (GSM) Technical Committee of the European Telecommunications

5、 Standards Institute (ETSI). This I-ETS specifies the Voice Activity Detector (VAD) and the conformance tests for VADs used in any Base Station System (BSS) or Mobile Station (MS) within the European digital cellular telecommunications system (phase 1). Reference is made within this I-ETS to the fol

6、lowing ETSI Technical Specifications (NOTE 1): GSM 06.10 Full-rate speech transcoding (Version 3.2.0). GSM 06.12 GSM 06.31 Comfort noise aspects for full-rate speech traffic channels (Version 3.0.0). Discontinuous transmission (DTX) for full-rate speech traffic channels (Version 3.1.0). The above ET

7、SI-GSM Technical Specifications are normative. Annexes 1, 2 and 3 of this standard are informative. NOTE 1: The GSM Technical Committee of ETSI has constituted stable and consistent documents which give technical specifications for the implementation of the European digital cellular telecommunicatio

8、ns system. Historically, these documents have been identified as “GSM recommendations11. Some of these recommendations may subsequently become I-ETCs or ETSs, whilst some may continue with the status of ETSI-GSM Technical Specifications. These ETSI-GSM Technical Specifications are, for editorial rea

9、sons, still referred to as GSM recommendations in current GSM documents. The ETSI-GSM Technical Committee numbering and version control system remains unchanged for the ETSI-GSM Technical Specifications. NOTE 2: Item in this draft indicated as not complete, or requiring further study or work, are no

10、t required for the phase 1 implementation of the European digital cellular telecommunications system. -3- ETSI/GSM m 3404583 OOL5837 b m Version 3.0.0 : GSM 06.32 Title: Voice Activity Detection Date: January 1991 t of conte-: O. 1. 2. 3, 4. SCOPE GENERAL FUNCTIONAL DESCRIPTION COMPUTATIONAL DETAILS

11、 DIGITAL TEST SEQUENCES ANNEX 1. SIMPLIFIED BLOCK FILTERING OPERATION ANNEX 2. DESCRIPTION OF DIGITAL TEST SEQUENCES ANNEX 3. VAD PERFORMANCE ,I ae of -: English Number of Daae?: 37 -4y ,- y- 1 E 340g583 0035838 8 E ETSI/GSM Version 3.0.0 Detailed list of contents . O. SCOPE = 2. FUNCTIONAL DESCRIPT

12、ION PPlll=1II=P=311= 2.1. OVERVIEW AND PRINCIPLES OF OPERATION 2.2. ALGORITHM DESCRIPTION 2.2.1. Adaptive filtering and energy computation 2.2.2. ACF averaging , 2.2.3. Predictor values computation 2.2.4. Spectral comparison 2.2.5. Periodicity detection 2.2.6. Threshold adaptation 2.2.7. VAD decisio

13、n 2.2.8. VAD hangover addition 3. COMPUTATIONAL DETAILS =- i-PI=ll=flt= 3.1. ADAPTIVE FILTERING AND ENERGY COMPUTATION 3.2. ACF AVERAGING 3.3. PREDICTOR VALUES COMPUTATION 3.3.1. Schur recursion to compute reflection coefficients 3.3.2. Step-up procedure to obtain the aav108 3.3.3. Computation of th

14、e ravl08 3.4. SPECTRAL COMPARISON 3.5. PERIODICITY DETECTION 3.6. THRESHOLD ADAPTATION 3.7. VAD DECISION 3.8. VAD HANGOVER ADDITION 3.9. PERIODICITY UPDATING 3404583 0035839 T ETSI/GSM Version 3.0.: 4.1. TEST CONFIGURATION 4.2. TEST SEQUENCES ANNEX 2. DESCRIPTION OF DIGITAL TEST SEQUENCES =iaxa ANNE

15、X 3. VAD PERFORMANCE =fPePPICP=ll I 3404583 0035840 b ETSI/GSM GSM 06.32 / page 4 Version 3.0.0 This recommendation specifies the voice activity detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in Recommendation GSM 06.31. It also specifies the test methods to be used t

16、o verify that a VAD complies with the recommendation. The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations or Base Station Systems. ._ 1. GENERAL =3c= The function of the VAD is to indicate whether each 20ms frame produced by the speech encoder contains speech or no

17、t. The output is a binary flag which is used by the TX DTX handler defined in recommendation GSM 06.31. The recommendation is organised as follows: Section 2 describes the principles of operation of the VAD, In section 3, the computational details necessary for the fixed point implementation of the

18、VAD algorithm are given. This section uses the same notation as used for computational details in GSM 06.10. The verification of the VAD is based on the use of digital test sequences. Section 4 defines the input and output signals and the test configuration, whereas the detailed description of the t

19、est sequences is contained in annex 2. The performance of the VAD algorithm is characterised by the amount of audible speech clipping it introduces and the percent- age activity it indicates. These characteristics for the VAD defined in this recommendation have been established by extensive testing

20、under a wide range of operating conditions. The results are summarised in annex 3. 2. FUNCTIONAL DESCRIPTION tPIIPIPltt The purpose of this section is to give the reader an understand- ing of the principles of operation of the VAD, whereas the detailed description is given in section 3. In case of d

21、iscrepancy between the two descriptions, the detailed description of section 3 shall prevail. In the following subsections of section 2, a Pascal programming type of notation has been used to describe the algoritbm. 2 1.1 Ys;js5 M 3404583 OOL584L 8 M ETSI/GSM GSM 06.32 / page 5 Version 3.0.0 2.1. OV

22、ERVIEW AND PRINCIPLES OF OPERATION The function of the VAD is to distinguish between noise with speech present and noise without speech present. The biggest dif- ficulty for detecting speech in a mobile environment is the very low speech/noise ratios which are often encountered. The accuracy of the

23、VAD is improved by using filtering to increase the speech/noise ratio before the decision is made. For a mobile environment, the worst speech/noise ratios are en- countered in moving vehicles. It has been found that the noise is relatively stationary for quite long periods in a mobile environ- ment.

24、 It is therefore possible to use an adaptive filter with coefficients obtained during noise, to remove much of the vehicle noise. The VAD is basically an energy detector. The energy of the fil- tered signal is compared with a threshold: speech is indicated whenever the threshold is exceeded. The noi

25、se encountered in mobile environments may be constantly changing in level. The spectrum of the noise can also change, and varies greatly over different vehicles. Because of these changes the VAD threshold and adaptive filter coefficients must be con- stantly adapted. To give reliable detection the t

26、hreshold must be sufficiently above the noise level to avoid noise being iden- tified as speech but not so far above it that low level parts of speech.are identified as noise. The threshold and the adaptive filter coefficients are only updated when speech is not present. It is, of course, potentiall

27、y dangerous for a VAD to update these values on the basis of its own decision. This adaptation there- fore only occurs when the signal seems stationary in the fre- quency domain but does not have the pitch component inherent in voiced speech and information tones. A further mechanism is used to ensu

28、re that low level noise (which is often not stationary over long periods). is not detected as speech. Here, an additional fixed threshold is used. A VAD hangover period is used to eliminate mid-burst clipping of low level speech. Hangover is only added to speech-bursts which exceed a certain duratio

29、n to avoid extending noise spikes. ETSI/GSM -+ GSM 06.32 / page 6 Adsptivs hmd VAD vad C VAD fiiteruig and energy computation hangover - decision addition Version 3.0.0 w Periodicity ptch Threshold - The block diagram of the VAD algorithm is shown in Figure 2/1. The individual blocks are described i

30、n the following sections. ACF and ?i are calculated in the speech encoder. thvad -c Fia 2 - 1 Functional block diaaram of the VAD detection 2 adaptation The global variables shown in the block diagram are described as follows: Redctor - ACF are autocorrelation coefficients which are calculated in th

31、e speech encoder defined in recommendation GSM 06.10 (section 3.1.4, see also Annex 1). The inputs to the speech encoder are 16 bit 2s complement numbers, as described in rec. GSM 06.10, section 4.2.0. 6 stat 7 spec!ral - avo and avl are averaged ACF vectors. - ravl are autocorrelated predictor valu

32、es obtained from avl. - rvad are the autocorrelated predictor values of the adaptive filter. 71 i * jbb r-. vahter com putation ravi cornparisor avl + ACF averaging av0 3404583 0035843 3 ETSI/GSM GSM 06.32 / page 7 Version 3.0.0 - N is the long term predictor lag value which is obtained every subseg

33、ment in the speech coder defined in recommendation GSM 06.10. - ptch indicates whether the signal has a steady periodic com- ponent. - pvad is the energy in the current frame of the input signal after filtering. - thvad is an adaptive threshold. - stat indicates spectral stationarity. - vvad indicat

34、es the VAD decision before hangover is added. - vad is the final VAD decision with hangover included. Pvad is computed as follows: 8 i=l Pvad := rvadO ACFCO + 2SUM rvadri ACFCi This corresponds to performing an 8th order block filtering on the input samples to the speech encoder, after zero offset c

35、ompensation and pre-emphasis. This is explained in annex 1. Spectral characteristics of the input signal have to be obtained using blocks that are larger than one 20ms frame. This is done by averaging the autocorrelation values for several consecutive frames. This averaging is given by the following

36、 equations: frames - 1 avOni := SUM ACFn-j)i t i = 08 j =o avini := avOn-frames)i ; i = 08 Where n represents the current frame, n-1 represents the previous frame etc. The values of constants are given in table 2-1. 3404583 0015844 3 Table ETC I /GSM GSM 06.32 / Page 8 Version 3.0.0 2.2.3. Predictor

37、 values computation The filter predictor values aavl are obtained from the autocor- relation values avl according to the equation: -1 a:=B p where: R := and: p := - - a := taavlll aavl 23 t laavl3: I aavl 41 I aavl 51 f I aavl 61 I faavi7 f laavl8 I - - aavlO := -1 avl is used in preference to avo a

38、s avo may contain speech. 3404583 0035845 5 ETC I /GSM GSM 06.32 / page 9 Version 3.0.0 The autocorrelated predictor values ravl are then obtained: 8-1 k-O ravlii := SUM aavllk aavlk+i 2.2.4. Spectral comparison -_-_- ; i = 08 The spectra represented by the autocorrelated predictor values ravl and t

39、he averaged autocorrelation values avo are compared using the distortion measure dm defined below. This measure is used to produce a boolean value stat every 20ms, as given by these equations: 8 dm := ( ravlOavOO + 2SM ravliavOi ) / av00 io1 difference := Idm - lastdml lastdm := dm stat := differenc

40、e = nthresh The following operations are done after the VAD decision and when the current LTP lag values (NO N3) are available, this reduces the delay of the VAD decision. (N-1) = N3 of previous segment.) . lagcount. : = O for j := O to 3 do begin smallag :- maximum(Nj,Nj-1) mod minimum(Nj,Nj-1) if

41、minimum(smallag,minimum(Nj,Nj-1)-smallag) thvad 2.2.8. VAD hangover addition . Version 3.0.2 VAD hangover is only added to bursts of speech greater than or equal to burstconst blocks. The boolean variable vad indicates the decision of the VAD with hangover included. The values of the constants are g

42、iven in table 2-5. The hangover algorithm is as follows : if vvad then increment(burstcount) else burstcount := O if burstcount = burstconst then begin end hangcount := hangconst; burstcount := burstconst vad := vvad or (hangcount = O) if hangcount = O then decrement(hangc0unt) 3404583 0035850 9 ETS

43、I/GSM GSM 06.32 / page 14 Version 3.0.0 In the next paragraphs, the detailed description of the VAD algorithm follows the preceeding high level description, This detailed description is divided in nine sections related to the blocks of figure 2-1 (except the last one) in the high level description o

44、f the VAD algorithm. Those sections are: 1, Adaptive filtering and energy computation 2. ACF averaging 3. Predictor values computation 4. Spectral comparison 5. Periodicity detection 6. Threshold adaptation 7. VAD decision 8. VAD hangover addition 9. Periodicity updating The VAD algorithm takes as i

45、nput the following variables of the RPE-LTP encoder (see the detailed description of the RPE-LTP encoder rec GSM 06.10): - L ACF08j, autcorrelation function (rec GSM 06.10/4,2.4); - scalauto, scaling factor to compute the L ACF08 (rec GSM - Nc, LTP lag (one for each sub-segment, rec GSM 06.10/4.2.11

46、); - 06.1014.2.4) : So four Nc values are needed for the VAD algorithm. The VAD computation can start as soon as the L ACF08 and scalauto variables are known. This means that The VAD computation can take place after part 4.2.4 of rec GSM 06.10 (Autocorrelation) of the LPC analysis section of the RPE

47、-LTP encoder. This scheme will reduce the delay to yield the VAD information. The . periodicity updating which is included in section 2.2.5, is done after the processing of the current speech encoder frame. All the arithmetic operations and names of the variables follow the RPE-LTP detailed descript

48、ion. To increase the precision within the fixed point implementation, a pseudo-floating point representation of some variables is used. This stands for the following variables (and related constants) of the VAD algorithm: - l7f M 3404583 0015851 O ETSI/GSM GSM 06.32 / page 15 Version 3.0.: pvad: Ene

49、rgy of filtered signal thvad: Threshold of the VAD decision. acf0: Energy of input signal. For the representation of these variables, two integers (16 bits) are needed: - one for the exponent (e pvad, e thvad, e acf0) - one for the mantissa (m-pvad, mIthvad, m-acf0). The value e pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and the m pvad value represents a integer wh

邮箱/手机：
温馨提示：	如需开发票，请勿充值！快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：	注意：如需开发票，请勿充值！
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？