ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf

上传人:wealthynice100 文档编号:733871 上传时间:2019-01-08 格式:PDF 页数:32 大小:1.06MB
下载 相关 举报
ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf_第1页
第1页 / 共32页
ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf_第2页
第2页 / 共32页
ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf_第3页
第3页 / 共32页
ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf_第4页
第4页 / 共32页
ETSI GSM 06 32-1994 European Digital Cellular Telecommunications System (Phase 2) Voice Activity Detection (VAD) (ETS 300 580-6 Version 4 1 0)《欧洲数字蜂窝通信系统(第2阶段) 语音活动检测(VAD)(ETS 300 _1.pdf_第5页
第5页 / 共32页
点击查看更多>>
资源描述

1、3404583 OLO553L 279 Released: 15 February 1994 GSM 06.32 Version: 4.0.3 Date: 21 January 1994 Source: ETSI TC-SMG Reference: GSM 06.32 UDC: 621.396.21 Key words: European digital cellular telecommunications system, Global System for Mobile communications (GSM) European digital cellular telecommunica

2、tions system (Phase 2); Voice Activity Detection (VAD) (GSM 06.32) ETSI European Telecommunications Standards Institute ETSI Secretariat Postal address: 06921 Sophia Antipolis Cedex - FRANCE Office address: Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE Tel.: + 33 92 94 42 O0 - Fax: + 33

3、93 65 47 16 European Telecommunications Standards Institute 1 994. All rights reserved. No part may be reproduced except as authorised by written permission. The copyright and the foregoing restriction on reproduction extend to all media in which the information may be embodied. W 3404583 0305532 LO

4、5 Page 2 GSM 06.32 version 4.0.3: January 1994 Whilst every care has been taken in the preparation and publication of this document, errors in content, typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to “ETSI Editing and Standards Approval Dept.“ at

5、the address shown on the title page. . 3404583 0305533 041 Page 3 GSM 06.32 version 4.0.3: January 1994 Contents Foreword 5 0.1 scope 7 0.2 Normative references. . 7 0.3 Definitions and abbreviations . 7 1 General . 7 2 Functional description 8 2.1 Overview and principles of operation 8 2.2 Algorith

6、m description . 8 2.2.1 Adaptive filtering and energy computation 10 2.2.2 ACF averaging . 10 2.2.3 Predictor values computation . 11 2.2.4 Spectral comparison 11 2.2.5 Periodicity detection . 12 2.2.6 Threshold adaptation 13 2.2.7 VAD decision 15 2.2.8 VAD hangover addition 15 3 Computational detai

7、ls. . 15 3.1 Adaptive filtering and energy computation . 17 3.2 ACF averaging 18 3.3 Predictor values computation 19 3.3.1 Schur recursion to compute reflection coefficients 19 3.3.2 Step-up procedure to obtain the aavlO 81 20 3.3.3 Computation of the ravlO 81 . 21 3.4 Spectral comparison . 21 3.5 P

8、eriodicity detection . 22 3.6 Threshold adaptation . 23 3.7 VAD decision . 26 3.8 VAD hangover addition 26 3.9 Periodicity updating 26 4 Digital test sequences 27 4.1 Test configuration. 27 4.2 Test sequences 28 Annex 1 (informative): Simplified block filtering operation 29 Annex 2 (informative): De

9、scription of digital test sequences . 30 A2.1 Test sequences 30 A2.2 File format description 32 Annex 3 (informative): VAD performance . 34 , 3404583 0305534 T88 Page 5 GSM 06.32 version 4.0.3: January 1994 Foreword This space is reserved for the foreword of future versions of this document. Previou

10、s page is blank . 3404583 0305535 934 Page 7 GSM 06.32 version 4.0.3: January 1994 0.1 Scope This technical specification specifies the voice activity detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in GSM 06.31. It also specifies the test methods to be used to verfi t

11、hat a VAD complies with the technical specification. The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations or Base Station Systems. 0.2 Normative references This E incorporates by dated and undated reference, provisions from other publications. These normative refere

12、nces are cited at the appropriate places in the text and the publications are listed hereafter. For dated references, subsequent amendments to or revisions of any of these publications apply to this ETC only when incorporated in it by amendment or revision. For undated references, the latest edition

13、 of the publication referred to applies. PI 31 141 GSM 01.04: “European digital cellular telecommunication system (Phase 2); Definitions, abbreviations and acronyms“. GSM 06.1 O: “European digital cellular telecommunication system (Phase 2); Full rate speech transcoding“. GSM 06.1 2: “European digit

14、al cellular telecommunication system (Phase 2); Comfort noise aspect for full rate speech traffic channels. GSM 06.31 : “European digital cellular telecommunication system (Phase 2); Discontinuous Transmission (DTX) for full rate speech traffic channel“. 0.3 Definitions and abbreviations Definitions

15、 and abbreviations used in this specification are listed in GSM 01.04. 1 Gen era I The function of the VAD is to indicate whether each 20ms frame produced by the speech encoder contains speech or not. The output is a binary flag which is used by the TX DTX handler defined in GSM 06.31. The technical

16、 specification is organised as follows: Section 2 describes the principles of operation of the VAD. In section 3, the computational details necessary for the fixed point implementation of the VAD algorithm are given. This section uses the same notation as used for computational details in GSM 06.10.

17、 The verification of the VAD is based on the use of digital test sequences. Section 4 defines the input and output signals and the test configuration, whereas the detailed description of the test sequences is contained in annex 2. The performance of the VAD algorithm is characterised by the amount o

18、f audible speech clipping it introduces and the percentage activity it indicates. These characteristics for the VAD defined in this technical specification have been established by extensive testing under a wide range of operating condaions. The results are summarised in annex 3. Previous page is bl

19、ank CENELEC GSM*Ob-32 94 3404583 010553b 850 = Page 0 GSM 06.32 version 4.0.3: January 1994 2 Functional description The purpose of this section is to give the reader an understanding of the principles of operation of the VAD, whereas the detailed description is given in section 3. In case of discre

20、pancy between the two descriptions, the detailed description of section 3 shall prevail. In the following subsections of section 2, a Pascal programming type of notation has been used to describe the algorithm. 21 Overview and principles of operation The function of the VAD is to distinguish between

21、 noise with speech present and noise without speech present. The biggest difficulty for detecting speech in a mobile environment is the very low speectVnoise ratios which are often encountered. The accuracy of the VAD is improved by using filtering to increase the speech/noise ratio before the decis

22、ion is made. For a mobile environment, the worst speechhoise ratios are encountered in moving vehicles. It has been found that the noise is relatively stationary for quite long periods in a mobile environment. It is therefore passible to use an adaptive filter with coefficients obtained during noise

23、, to remove much of the vehicle noise. The VAD is basically an energy detector. The energy of the filtered signal is compared with a threshold; speech is indicated whenever the threshold is exceeded. The noise encountered in mobile environments may be constantly changing in level. The spectrum of th

24、e noise can also change, and varies greatly over different vehicles. Because of these changes the VAD threshold and adaptive filter coefficients must be constantly adapted. To give reliable detection the threshold must be sufficiently above the noise level to avoid noise being identified as speech b

25、ut nut so far above it that low level parts of speech are identified as noise. The threshold and the adaptive filter coefficients are only updated when speech is not present. It is, of course, potentially dangerous for a VAD to update these values on the basis of its own decision. This adaptation th

26、erefore only occurs when the signai seems stationary in the frequency domain but does not have the pitch component inherent in voiced speech and information tones. A further mechanism is used to ensure that low level noise (which is oten not stationary over long periods) is not detected as speech. H

27、ere, an additional fixed threshold is used. A VA hangover period is used to eliminate mid-burst clipping of low level speech. Hangover is only added to speech-bursts which exceed a certain duration to avoid extending noise spikes. 22 Algorithm description The block diagram of the VA0 algorithm is sh

28、own in Figure 2-1. The individual blocks are described in the following sections. ACF and N are calculated in the speech encoder. ACF st at 3404583 0305537 797 Page 9 GSM 06.32 version 4.0.3 January 1994 i Adaptive 1 p vad , 1 VAD 1 vad i hangover filtering and energy decision .hiitinn computation 7

29、 vad II I I l +I Periodicity Threshold I lth vad Spectral comparison Predictor values om put at ion I av p-A averaging m2-1. Functional block diagram of the VAD The global variables shown in the block diagram are described as follows: - ACF are autocorrelation coefficients which are calculated in th

30、e speech encoder defined in GSM 06.10 (section 3.1.4, see also Annex 1). The inputs to the speech encoder are 16 bit 2s complement numbers, as described in GSM 06.1 O, section 4.2.0. - av and avl are averaged ACF vectors. - ravl are autocorrelated predictor values obtained from avl. - wad are the au

31、tocorrelated predictor values of the adaptive filter. - N is the long term predictor lag value which is obtained every subsegment in the speech coder defined in GSM 06.10. - ptch indicates whether the signal has a steady periodic component. - pvad is the energy in the current frame of the input sign

32、al after filtering. - thvad is an adaptive threshold. - stat indicates spectral stationarity. - wad indicates the VAD decision before hangover is added. - vad is the final VAD decision with hangover included. W 3404583 0105538 623 W Page 10 GSM 06.32 version 4.0.3: January 1994 22.1 Adaptive filteri

33、ng and energy computation had is computed as follows: 8 i=l mad := rvad ACFO + 2SUM rvadi ACFi This corresponds to performing an 8th order block filtering on the input samples to the speech encoder, after zero offset compensation and pre-emphasis. This is explained in annex 1. 22.2 ACF averaging Spe

34、ctral characteristics of the input signal have to be obtained using blocks that are larger than one 20ms kame. This is done by averaging the autocorrelation values for several consecutive kames. This averaging is given by the following equations: frames - 1 j=O avOn)i := SUM ACFn-ji ; i = 08 avini :

35、= avOn-framesi ; i = 08 Where n represents the current frame, n-1 represents the previous frame etc. The values of constants are given in table 2-1. Table 2-1. Constants and variables for ACF averaging 3404583 0305539 5bT Page 11 GSM 06.32 version 4.0.3: January 1994 223 Predictor values computation

36、 The filter predictor values aavl are obtained from the autocorrelation values avl according to the equation: where : and : aavlO := -1 avl is used in preference to avo as avo may contain speech. The autocorrelated predictor values ravl are then obtained: 8-i k= O ravli := SUM aavlk aavlk+i 2.2.4 Sp

37、ectral comparison ; i = 08 The spectra represented by the autocorrelated predictor values ravl and the averaged autocorrelation values avo are compared using the distortion measure dm defined below. This measure is used to produce a boolean value stat every 20ms, as given by these equations: 3904583

38、 0305540 281 Page 12 GSM 08.32 version 4.0.3: January 1994 8 dm := ( ravlOavOO + LSUM ravliavOi ) / avOO i=l difference := Idm - lastdm) laatdm := dm stat := difference = nthresh The following operations are done after the VAD decision and when the current LTP lag values (NO N3) are available, this

39、reduces the delay of the VAD decision. (N(-1) = N3 of previous segment.) lagcount := O for j := O to 3 do begin emallag := maximum(Nj,Nj-1) mod minimum(Nj,Nj-1) if minimum(smallag,minimum(Nj,Nj-l)-smaag) C lthresh then increment(1agcount) end veryoldlagcount := oldlagcount oldlagcount := lagcount 34

40、04583 OLO.5541 118 Page 13 GSM 06.32 version 4.0.3: January 1994 The values of Constants and initial values are given in table 2-3. Table 2-3. Constants and variables for periodic*Q detection 22.6 Threshold adaptation A check is made every 20ms to determine whether the VAD decision threshold (thvad)

41、 should be changed. This adaptation is carried out according to the flowchart shown in fig 2-2. The constants used are given in table 2-4. Adaptation takes place in two different situations: firstly whenever ACFO is very low and secondly whenever there is a very high probability that speech is not p

42、resent. In the first case, the threshold is adapted if the energy of the input signal is less than pth. The threshold is set to plev without carrying out any further tests because at these very low levels the effect of the signal quantization makes it impossible to obtain reliable results from these

43、 tests. In the second case, the decision threshold (thvad) and the adaptive filter coefficients (wad) are only updated with the ravl values when the signal is stationary and has no periodic component. In this situation there is a very high probability that speech is not present. The stationarity is

44、detected in the frequency domain, by calculating the spectral difference using consecutive averaged ACF values. If this spectral difference changes very little over a certain number of frames (adp), and the signal does not have a periodic component inherent in voiced speech and information tones, th

45、en adaptation occurs. The stepsize by which the threshold is adapted is not constant but a proportion of the current value (determined by constants dec and inc). The adaptation begins by experimentally multiplying the threshold by a factor of (1-l/dec). If the new threshold is now higher than or equ

46、al to Pvad times fac then the threshold needed to be decreased and it is left at this new lower level. If, on the other hand, the new threshold level is less than Pvad times fac then the threshold either needed to be increased or kept constant. In this case it is set to Pvad times fac unless this wo

47、uld mean multiplying it by more than a factor of (l+l/inc) (in which case it is multiplied by a factor of (l+l/inc). The threshold is never allowed to be greater than Pvad+margin. Table 2-4. Constants and variables for threshold adaptation 3404583 0305542 054 ldec Ihvad =th vid -Ih vad L Page 14 GSM

48、 06.32 version 4.0.3: January 1994 YB 1 linc,P vad “E) Ih vad =min(th vad th vad I BEGIN b- th p t margin vad vad yes I 1 + adamnt = adp t END Fia 2-2. Flow diagom for threshold aptation m 3404583 0305543 T90 m Page 15 GSM 06.32 version 4.0.3: January 1994 227 VAD decision Prior to hangover the VAD

49、decision condition is: wad := pvad thvad 22.8 VAD hangover addition VAD hangover is only added to bursts of speech greater than or equal to burstconst blocks. The boolean variable vad indicates the decision of the VAD with hangover included. The values of the constants are given in table 2-5. The hangover algorithm is as follows: if wad then increment(burstc0unt) else burstcount := O if burstcount = burstconst then begin hangcount := hangconst; burstcount := burstconst end vad := wad or (hangcount

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1