1、 ETSI TS 1Digital cellular telecoVoice Activity Detectospee(3GPP TS 46.0TECHNICAL SPECIFICATION146 082 V13.0.0 (2016communications system (Phator (VAD) for Enhanced Full Reech traffic channels .082 version 13.0.0 Release 13GLOBAL SYSTEMOBILE COMMUN16-01) hase 2+); l Rate (EFR) 13) TEM FOR ICATIONSRE
2、TSI ETSI TS 146 082 V13.0.0 (2016-01)13GPP TS 46.082 version 13.0.0 Release 13Reference RTS/TSGS-0446082vd00 Keywords GSM ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucra
3、tif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the pr
4、esent document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network
5、drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the prese
6、nt document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorize
7、d by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2016. All rights reserved. DECTTM, PLUGTES
8、TSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association.
9、 ETSI ETSI TS 146 082 V13.0.0 (2016-01)23GPP TS 46.082 version 13.0.0 Release 13Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI memb
10、ers and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi
11、.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the
12、 present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as bein
13、g references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “w
14、ill not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 146 082 V13.0.0 (2016-01)33GPP TS 46.
15、082 version 13.0.0 Release 13Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 References 5g33 Definitions, symbols and abbreviations . 5g33.1 Definitions 5g33.2 Symbols 5g33.2.1 Variables . 5g33.2.2 Constants 6g33.2.3 Functions 6g33.3 Abbre
16、viations . 7g34 General . 7g35 Functional description 7g35.1 Overview and principles of operation 7g35.2 Algorithm description . 7g35.2.1 Adaptive filtering and energy computation 8g35.2.2 ACF averaging 9g35.2.3 Predictor values computation 9g35.2.4 Spectral comparison 10g35.2.5 Information tone det
17、ection 10g35.2.6 Threshold adaptation. 11g35.2.7 VAD decision . 12g35.2.8 VAD hangover addition 13g35.2.9 Periodicity detection . 13g36 Computational description overview 14g36.1 VAD modules . 14g36.2 Pseudo-floating point arithmetic 14g3Annex A (informative): Simplified block filtering operation .
18、16g3Annex B (informative): Pole frequency calculation 17g3Annex C (informative): Change history . 18g3History 19g3ETSI ETSI TS 146 082 V13.0.0 (2016-01)43GPP TS 46.082 version 13.0.0 Release 13Foreword This Technical Specification has been produced by the 3rdGeneration Partnership Project (3GPP). Th
19、e present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) for Enhanced Full Rate (EFR) speech traffic channels within the digital cellular telecommunications system. The contents of the present document are subject to continuing work within the
20、 TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for inf
21、ormation; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been
22、 incorporated in the document. ETSI ETSI TS 146 082 V13.0.0 (2016-01)53GPP TS 46.082 version 13.0.0 Release 131 Scope The present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in GSM 06.81 5 Discontinuous transmission (DTX) for E
23、nhanced Full Rate (EFR) speech traffic channels. The requirements are mandatory on any VAD to be used either in GSM Mobile Stations (MS)s or Base Station Systems (BSS)s that utilize the enhanced full-rate speech traffic channel. 2 References The following documents contain provisions which, through
24、reference in this text, constitute provisions of the present document. References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest versi
25、on applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. 1 GSM 01.04: “Digital cellular telecommunications system (Phase 2+); Abbreviations and
26、acronyms“. 2 GSM 06.53: “Digital cellular telecommunications system (Phase 2+); ANSI-C code for the GSM Enhanced Full Rate (EFR) speech codec“. 3 GSM 06.54: “Digital cellular telecommunications system (Phase 2+); Test vectors for the GSM Enhanced Full Rate (EFR) speech codec“. 4 GSM 06.60: “Digital
27、cellular telecommunications system (Phase 2+); Enhanced Full Rate (EFR) speech transcoding“. 5 GSM 06.81: “Digital cellular telecommunications system (Phase 2+); Discontinuous transmission (DTX) for Enhanced Full Rate (EFR) speech traffic channels“. 3 Definitions, symbols and abbreviations 3.1 Defin
28、itions For the purposes of the present document, the following terms and definitions apply: noise: signal component resulting from acoustic environmental noise. mobile environment: any environment in which mobile stations may be used. 3.2 Symbols For the purposes of the present document, the followi
29、ng symbols apply: 3.2.1 Variables aav1 filter predictor values, see clause 5.2.3 ETSI ETSI TS 146 082 V13.0.0 (2016-01)63GPP TS 46.082 version 13.0.0 Release 13acf the ACF vector which is calculated in the speech encoder (GSM 06.60 4) adaptcount secondary hangover counter, see clause 5.2.6 av0 avera
30、ged ACF vector, see clause 5.2.2 av1 a previous value of av0, see clause 5.2.2 burstcount speech burst length counter, see clause 5.2.8 den denominator of left hand side of equation 8 in annex B, see clause 5.2.5 difference difference between consecutive values of dm, see clause 5.2.4 dm spectral di
31、stortion measure, see clause 5.2.4 hangcount primary hangover counter, see clause 5.2.8 lagcount number of subframes in current frame meeting periodicity criterion, see clause 5.2.9 lastdm previous value of dm, see clause 5.2.4 lags the open loop long term predictor lags for the two halves of the sp
32、eech encoder frame (GSM 06.60 4) num numerator of left hand side of equation 8 in annex B, see clause 5.2.5 oldlagcount previous value of lagcount, see clause 5.2.9 prederr fourth order short term prediction error, see clause 5.2.5 ptch Boolean flag indicating the presence of a periodic signal compo
33、nent, see clause 5.2.9 pvad energy in the current filtered signal frame, see clause 5.2.1 rav1 autocorrelation vector obtained from av1, see clause 5.2.3 rc the first four unquantized reflection coefficients calculated in the speech encoder (GSM 06.60 4) rvad autocorrelation vector of the adaptive f
34、ilter predictor values, see clause 5.2.6 smallag difference between consecutive lag values, see clause 5.2.9 stat Boolean flag indicating that the frequency spectrum of the input signal is stationary, see clause 5.2.4 thvad adaptive primary VAD threshold, see clause 5.2.6 tone Boolean flag indicatin
35、g the presence of an information tone, see clause 5.2.5 vadflag Boolean VAD decision with hangover included, see clause 5.2.8 veryoldlagcount previous value of oldlagcount, see clause 5.2.9 vvad Boolean VAD decision before hangover, see clause 5.2.7 3.2.2 Constants adp number of frames of hangover f
36、or secondary VAD, see clause 5.2.6 burstconst minimum length of speech burst to which hangover is added, see clause 5.2.8 dec determines rate of decrease in adaptive threshold, see clause 5.2.6 fac determines steady state adaptive threshold, see clause 5.2.6 frames number of frames over which av0 an
37、d av1 are calculated, see clause 5.2.2 freqth threshold for pole frequency decision, see clause 5.2.5 hangconst number of frames of hangover for primary VAD, see clause 5.2.8 inc determines rate of increase in adaptive threshold, see clause 5.2.6 lthresh lag difference threshold for periodicity deci
38、sion, see clause 5.2.9 margin determines upper limit for adaptive threshold, see clause 5.2.6 nthresh frame count threshold for periodicity decision, see clause 5.2.9 plev lower limit for adaptive threshold, see clause 5.2.6 predth threshold for short term prediction error, see clause 5.2.5 pth ener
39、gy threshold, see clause 5.2.6 thresh decision threshold for evaluation of stat flag, see clause 5.2.4 3.2.3 Functions + addition - subtraction * multiplication / division | x | absolute value of x AND Boolean AND OR Boolean ORb MULT(x(i) the product of the series x(i) for i=a to b i=a b ETSI ETSI T
40、S 146 082 V13.0.0 (2016-01)73GPP TS 46.082 version 13.0.0 Release 13SUM(x(i) the sum of the series x(i) for i=a to b i=a 3.3 Abbreviations For the purposes of the present document, the following abbreviations apply: ACF Autocorrelation function ANSI American National Standards Institute DTX Disconti
41、nuous Transmission LTP Long Term Predictor TX Transmission VAD Voice Activity Detector For abbreviations not given in this clause, see GSM 01.04 1. 4 General The function of the VAD is to indicate whether each 20 ms frame produced by the speech encoder contains speech or not. The output is a Boolean
42、 flag (vadflag) which is used by the Transmit (TX) DTX handler defined in GSM 06.81 5. The present document is organized as follows. Clause 5 describes the principles of operation of the VAD. Clause 6 provides an overview of the computational description of the VAD. The computational details necessa
43、ry for the fixed point implementation of the VAD algorithm are given in the form of ANSI C program contained in GSM 06.53 2. The verification of the VAD is based on the use of digital test sequences which are described in GSM 06.54 3. 5 Functional description The purpose of this clause is to give th
44、e reader an understanding of the principles of operation of the VAD, whereas GSM 06.53 2 contains the fixed point computational description of the VAD. In the case of discrepancy between the two descriptions, the description in GSM 06.53 2 will prevail. 5.1 Overview and principles of operation The f
45、unction of the VAD is to distinguish between noise with speech present and noise without speech present. This is achieved by comparing the energy of a filtered version of the input signal with a threshold. The presence of speech is indicated whenever the threshold is exceeded. The detection of speec
46、h in a mobile environment is difficult due to the low speech/noise ratios which are encountered, particularly in moving vehicles. To increase the probability of detecting speech the input signal is adaptively filtered (see clause 5.2.1) to reduce its noise content before the voice activity decision
47、is made (see clause 5.2.7). The frequency spectrum and level of the noise may vary within a given environment as well as between different environments. It is therefore necessary to adapt the input filter coefficients and energy threshold at regular intervals as described in clause 5.2.6. 5.2 Algori
48、thm description The block diagram of the VAD algorithm is shown in figure 1. The individual blocks are described in the following clauses. The variables shown in the block diagram are described in table 1. ETSI ETSI TS 146 082 V13.0.0 (2016-01)83GPP TS 46.082 version 13.0.0 Release 13Table 1: Descri
49、ption of variables in figure 1 Var Description acf The ACF vector which is calculated in the speech encoder (GSM 06.60 4). av0 Averaged ACF vector. av1 A previous value of av0. lags The open loop long term predictor lags for the two halves of the speech encoder frame (GSM 06.60 4). ptch Boolean flag indicating the presence of a periodic signal component. pvad Energy in the current filtered signal frame. rav1 Autocorrelation vector obtained from av1. rc The first four reflection coefficients calculated in the speech encoder (GSM 06.60