1、 ETSI TS 126 194 V14.0.0 (2017-04) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Voice Activity Detector (VAD) (3GPP TS 26.194 versio
2、n 14.0.0 Release 14) TECHNICAL SPECIFICATION ETSI ETSI TS 126 194 V14.0.0 (2017-04)13GPP TS 26.194 version 14.0.0 Release 14Reference RTS/TSGS-0426194ve00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N
3、348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The co
4、ntent of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Do
5、cument Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.
6、org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mec
7、hanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunicatio
8、ns Standards Institute 2017. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM
9、logo are Trade Marks registered and owned by the GSM Association. ETSI ETSI TS 126 194 V14.0.0 (2017-04)23GPP TS 26.194 version 14.0.0 Release 14Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to
10、these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Lat
11、est updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI
12、Web server) which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS
13、identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “
14、shall not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in di
15、rect citation. ETSI ETSI TS 126 194 V14.0.0 (2017-04)33GPP TS 26.194 version 14.0.0 Release 14Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 Normative References 5g33 Technical Description. 5g33.1 Definitions, symbols and abbreviations 5g
16、33.1.1 Definitions 5g33.1.2 Symbols 5g33.1.2.1 Variables . 5g33.1.2.2 Constants. 6g33.1.2.3 Functions . 7g33.1.3 Abbreviations 8g33.2 General . 8g33.3 Functional description 8g33.3.1 Filter bank and computation of sub-band levels . 9g33.3.2 Tone detection 10g33.3.3 VAD decision . 11g33.3.3.1 Hangove
17、r addition . 12g33.3.3.2 Background noise estimation 13g33.3.3.3 Speech level estimation . 14g34 Computational details . 14g3Annex A (informative) : Change history . 15g3History 16g3ETSI ETSI TS 126 194 V14.0.0 (2017-04)43GPP TS 26.194 version 14.0.0 Release 14Foreword This Technical Specification h
18、as been produced by the 3GPP. This document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in 3. The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TS
19、G modify the contents of this TS, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document
20、under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the specification; ETSI ETSI TS 126 194 V14.0.0 (2017-04)53GPP TS 26.19
21、4 version 14.0.0 Release 141 Scope This document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in 3. The requirements are mandatory on any VAD to be used either in User Equipment (UE) or Base Station Systems (BSS)s that utilize the AMR wi
22、deband speech codec. 2 Normative References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. - F
23、or a specific reference, subsequent revisions do not apply. - For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as t
24、he present document. 1 3GPP TS 26.173: “ANSI-C code for the Adaptive Multi-Rate Wideband speech codec“ . 2 3GPP TS 26.190: “Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions“ . 3 3GPP TS 26.193: “Speech codec speech processing funct
25、ions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Source controlled rate operation“. 4 ITU, The International Telecommunications Union, Blue Book, Vol. III, Telephone Transmission Quality, IXth Plenary Assembly, Melbourne, 14-25 November, 1988, Recommendation G.711, Pulse code modulation (
26、PCM) of voice frequencies. 5 3GPP TR 21.905: “Vocabulary for 3GPP Specifications“. 3 Technical Description 3.1 Definitions, symbols and abbreviations 3.1.1 Definitions For the purposes of the present document, the terms and definitions given in TR 21.905 5 and the following apply. A term defined in
27、the present document takes precedence over the definition of the same term, if any, in TR 21.905 5. frame: Time interval of 20 ms corresponding to the time segmentation of the speech transcoder. 3.1.2 Symbols For the purposes of this TS, the following symbols apply. 3.1.2.1 Variables bckr_estn backg
28、round noise estimate at the frequency band “n“ burst_count counts length of a speech burst, used by VAD hangover addition ETSI ETSI TS 126 194 V14.0.0 (2017-04)63GPP TS 26.194 version 14.0.0 Release 14hang_count hangover counter, used by VAD hangover addition leveln signal level at the frequency ban
29、d “n“ new_speech pointer of the speech encoder, points a buffer containing last received samples of a speech frame 2 noise_level estimated noise level pow_sum input power s(i) samples of the input frame snr_sum measure between input frame and noise estimate speech_level estimated speech level stat_c
30、ount stationary counter stat_rat measure indicating stationary of the input frame tone_flag flag indicating the presence of a tone vad_thr VAD threshold VAD_flag Boolean VAD flag vadreg intermediate VAD decision 3.1.2.2 Constants ALPHA_UP1 constant for updating noise estimate (see subclause 3.3.5.2)
31、 ALPHA_DOWN1 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA_UP2 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA_DOWN2 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA3 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA4 constan
32、t for updating average signal level (see subclause 3.3.5.2) ALPHA5 constant for updating average signal level (see subclause 3.3.5.2) BURST_HIGH constant for controlling VAD hangover addition (see subclause 3.3.5.1) BURST_P1 constant for controlling VAD hangover addition (see subclause 3.3.5.1) BURS
33、T_SLOPE constant for controlling VAD hangover addition (see subclause 3.3.5.1) COEFF3 coefficient for the filter bank (see subclause 3.3.1) COEFF5_1 coefficient for the filter bank (see subclause 3.3.1) COEFF5_2 coefficient for the filter bank (see subclause 3.3.1) HANG_HIGH constant for controlling
34、 VAD hangover addition (see subclause 3.3.5.1) HANG_LOW constant for controlling VAD hangover addition (see subclause 3.3.5.1) HANG_P1 constant for controlling VAD hangover addition (see subclause 3.3.5.1) HANG_SLOPE constant for controlling VAD hangover addition (see subclause 3.3.5.1) FRAME_LEN si
35、ze of a speech frame, 256 samples (20 ms) MIN_SPEECH_LEVEL1 constant for speech estimation (see subclause 3.3.5.3) ETSI ETSI TS 126 194 V14.0.0 (2017-04)73GPP TS 26.194 version 14.0.0 Release 14MIN_SPEECH_LEVEL2 constant for speech estimation (see subclause 3.3.5.3) MIN_SPEECH_SNR constant for VAD t
36、hreshold adaptation (see subclause 3.3.5) NO_P1 constant for VAD threshold adaptation (see subclause 3.3.5) NO_SLOPE constant for VAD threshold adaptation (see subclause 3.3.5) NOISE_MAX maximum value for noise estimate (see subclause 3.3.5.2) NOISE_MIN minimum value for noise estimate (see subclaus
37、e 3.3.5.2) POW_TONE_THR threshold for tone detection (see subclause 3.3.5) SP_ACTIVITY_COUNT constant for speech estimation (see subclause 3.3.5.3) SP_ALPHA_DOWN constant for speech estimation (see subclause 3.3.5.3) SP_ALPHA_UP constant for speech estimation (see subclause 3.3.5.3) SP_CH_MAX consta
38、nt for VAD threshold adaptation (see subclause 3.3.5) SP_CH_MIN constant for VAD threshold adaptation (see subclause 3.3.5) SP_EST_COUNT constant for speech estimation (see subclause 3.3.5.3) SP_P1 constant for VAD threshold adaptation (see subclause 3.3.5) SP_SLOPE constant for VAD threshold adapta
39、tion (see subclause 3.3.5) STAT_COUNT threshold for stationary detection (see subclause 3.3.5.2) STAT_THR threshold for stationary detection (see subclause 3.3.5.2) STAT_THR_LEVEL threshold for stationary detection (see subclause 3.3.5.2) THR_HIGH constant for VAD threshold adaptation (see subclause
40、 3.3.5) TONE_THR threshold for tone detection (see subclause 3.3.3) VAD_POW_LOW constant for controlling VAD hangover addition (see subclause 3.3.5.1) 3.1.2.3 Functions + Addition - Subtraction * Multiplication / Division | x | absolute value of x AND Boolean AND OR Boolean ORxnnab()=() ( ) ( ) ()=+
41、xa xa xb xb11K MIN(x,y) =xyyyxx,3.1.3 Abbreviations For the purposes of the present document, the abbreviations given in TR 21.905 5 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905 5. ANSI Am
42、erican National Standards Institute DTX Discontinuous Transmission VAD Voice Activity Detector CNG Comfort Noise Generation 3.2 General The function of the VAD algorithm is to indicate whether each 20 ms frame contains signals that should be transmitted, e.g. speech, music or information tones. The
43、output of the VAD algorithm is a Boolean flag (VAD_flag) indicating presence of such signals. 3.3 Functional description The block diagram of the VAD algorithm is depicted in Figure 1. The VAD algorithm uses parameters of the speech encoder to compute the Boolean VAD flag (VAD_flag). This input fram
44、e for VAD is sampled at the 6.4 kHz frequency and thus it contains 256 samples. Samples of the input frame (s(i) are divided into sub-bands and level of the signal (leveln) in each band is calculated. Input for the tone detection function are the normalized open-loop pitch gains which are calculated
45、 by open-loop pitch analysis of the speech encoder. The tone detection function computes a flag (tone_flag) which indicates presence of a signalling tone, voiced speech, or other strongly periodic signal. Background noise level (bckr_estn) is estimated in each band based on the VAD decision, signal
46、stationarity and the tone-flag. Intermediate VAD decision is calculated by comparing input SNR (leveln/bckr_estn) to an adaptive threshold. The threshold is adapted based on noise and long term speech estimates. Finally, the VAD flag is calculated by adding hangover to the intermediate VAD decision.
47、 Filter bankandcomputationof sub-bandlevelsVADdecisionTonedetectionol_gainVAD_flaglevelntone_flags(i)Figure 1: Simplified block diagram of the VAD algorithm ETSI ETSI TS 126 194 V14.0.0 (2017-04)93GPP TS 26.194 version 14.0.0 Release 143.3.1 Filter bank and computation of sub-band levels The input s
48、ignal is divided into frequency bands using a 12-band filter bank (Figure 2). Cut-off frequencies for the filter bank are shown in Table 1. Table 1. Cut-off frequencies for the filter bank Band number Frequencies 1 0 200 Hz 2 200 400 Hz 3 400 600 Hz 4 600 800 Hz 5 800 1200 Hz 6 1200 1600 Hz 7 1600 2
49、000 Hz 8 2000 2400 Hz 9 2400 - 3200 Hz 10 3200 4000 Hz 11 4000 4800 Hz 12 4800 6400 Hz Input for the filter bank is a speech frame pointed by the new_speech pointer of the speech encoder 1. Input values for the filter bank are scaled down by one bit. This ensures safe scaling, i.e. saturation can not occur during calculation of the filter bank. 5th orderfilter block5th orderfilter block5th orderfilter block3rd orderfilter block5th orderfi