1、 ETSI TS 126 194 V15.0.0 (2018-07) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Voice Activity Detector (VAD) (3GPP TS 26.194 versio
2、n 15.0.0 Release 15) TECHNICAL SPECIFICATION ETSI ETSI TS 126 194 V15.0.0 (2018-07)13GPP TS 26.194 version 15.0.0 Release 15Reference RTS/TSGS-0426194vf00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N
3、348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The co
4、ntent of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Do
5、cument Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.
6、org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mec
7、hanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. ETSI 2018. All rights res
8、erved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are trademarks of ETSI registered for the benefit of its Members. 3GPPTM and LTETMare trademarks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. oneM2M logo is protected for the benefit of its Members. GSM
9、and the GSM logo are trademarks registered and owned by the GSM Association. ETSI ETSI TS 126 194 V15.0.0 (2018-07)23GPP TS 26.194 version 15.0.0 Release 15Intellectual Property Rights Essential patents IPRs essential or potentially essential to normative deliverables may have been declared to ETSI.
10、 The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available
11、 from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 3
12、14 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Trademarks The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners. ETSI claims no ownership of these except for any which are
13、 indicated as being the property of ETSI, and conveys no right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks. Foreword This Tec
14、hnical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ET
15、SI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to
16、be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 126 194 V15.0.0 (2018-07)33GPP TS 26.194 version 15.0.0 Release 15Content
17、s Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Foreword . 4g31 Scope 5g32 Normative References 5g33 Technical Description. 5g33.1 Definitions, symbols and abbreviations 5g33.1.1 Definitions 5g33.1.2 Symbols 5g33.1.2.1 Variables . 5g33.1.2.2 Constants. 6g33.1.2.3 Functions
18、 . 7g33.1.3 Abbreviations 8g33.2 General . 8g33.3 Functional description 8g33.3.1 Filter bank and computation of sub-band levels . 8g33.3.2 Tone detection 10g33.3.3 VAD decision . 11g33.3.3.1 Hangover addition . 12g33.3.3.2 Background noise estimation 13g33.3.3.3 Speech level estimation . 14g34 Comp
19、utational details . 14g3Annex A (informative) : Change history . 15g3History 16g3ETSI ETSI TS 126 194 V15.0.0 (2018-07)43GPP TS 26.194 version 15.0.0 Release 15Foreword This Technical Specification has been produced by the 3GPP. This document specifies the Voice Activity Detector (VAD) to be used in
20、 the Discontinuous Transmission (DTX) as described in 3. The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying change of rel
21、ease date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical
22、enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the specification; ETSI ETSI TS 126 194 V15.0.0 (2018-07)53GPP TS 26.194 version 15.0.0 Release 151 Scope This document specifies the Voice Activity Detector (VAD) to be us
23、ed in the Discontinuous Transmission (DTX) as described in 3. The requirements are mandatory on any VAD to be used either in User Equipment (UE) or Base Station Systems (BSS)s that utilize the AMR wideband speech codec. 2 Normative References The following documents contain provisions which, through
24、 reference in this text, constitute provisions of the present document. - References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. - For a specific reference, subsequent revisions do not apply. - For a non-specific reference, the lates
25、t version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. 1 3GPP TS 26.173: “ANSI-C code for the Adaptive Multi-Rate Wideband speech codec
26、“ . 2 3GPP TS 26.190: “Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions“ . 3 3GPP TS 26.193: “Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Source controlled rate operation“. 4 ITU,
27、 The International Telecommunications Union, Blue Book, Vol. III, Telephone Transmission Quality, IXth Plenary Assembly, Melbourne, 14-25 November, 1988, Recommendation G.711, Pulse code modulation (PCM) of voice frequencies. 5 3GPP TR 21.905: “Vocabulary for 3GPP Specifications“. 3 Technical Descri
28、ption 3.1 Definitions, symbols and abbreviations 3.1.1 Definitions For the purposes of the present document, the terms and definitions given in TR 21.905 5 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905 5. f
29、rame: Time interval of 20 ms corresponding to the time segmentation of the speech transcoder. 3.1.2 Symbols For the purposes of this TS, the following symbols apply. 3.1.2.1 Variables bckr_estn background noise estimate at the frequency band “n“ burst_count counts length of a speech burst, used by V
30、AD hangover addition hang_count hangover counter, used by VAD hangover addition leveln signal level at the frequency band “n“ ETSI ETSI TS 126 194 V15.0.0 (2018-07)63GPP TS 26.194 version 15.0.0 Release 15new_speech pointer of the speech encoder, points a buffer containing last received samples of a
31、 speech frame 2 noise_level estimated noise level pow_sum input power s(i) samples of the input frame snr_sum measure between input frame and noise estimate speech_level estimated speech level stat_count stationary counter stat_rat measure indicating stationary of the input frame tone_flag flag indi
32、cating the presence of a tone vad_thr VAD threshold VAD_flag Boolean VAD flag vadreg intermediate VAD decision 3.1.2.2 Constants ALPHA_UP1 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA_DOWN1 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA_UP2 constant for upda
33、ting noise estimate (see subclause 3.3.5.2) ALPHA_DOWN2 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA3 constant for updating noise estimate (see subclause 3.3.5.2) ALPHA4 constant for updating average signal level (see subclause 3.3.5.2) ALPHA5 constant for updating average sign
34、al level (see subclause 3.3.5.2) BURST_HIGH constant for controlling VAD hangover addition (see subclause 3.3.5.1) BURST_P1 constant for controlling VAD hangover addition (see subclause 3.3.5.1) BURST_SLOPE constant for controlling VAD hangover addition (see subclause 3.3.5.1) COEFF3 coefficient for
35、 the filter bank (see subclause 3.3.1) COEFF5_1 coefficient for the filter bank (see subclause 3.3.1) COEFF5_2 coefficient for the filter bank (see subclause 3.3.1) HANG_HIGH constant for controlling VAD hangover addition (see subclause 3.3.5.1) HANG_LOW constant for controlling VAD hangover additio
36、n (see subclause 3.3.5.1) HANG_P1 constant for controlling VAD hangover addition (see subclause 3.3.5.1) HANG_SLOPE constant for controlling VAD hangover addition (see subclause 3.3.5.1) FRAME_LEN size of a speech frame, 256 samples (20 ms) MIN_SPEECH_LEVEL1 constant for speech estimation (see subcl
37、ause 3.3.5.3) MIN_SPEECH_LEVEL2 constant for speech estimation (see subclause 3.3.5.3) MIN_SPEECH_SNR constant for VAD threshold adaptation (see subclause 3.3.5) ETSI ETSI TS 126 194 V15.0.0 (2018-07)73GPP TS 26.194 version 15.0.0 Release 15NO_P1 constant for VAD threshold adaptation (see subclause
38、3.3.5) NO_SLOPE constant for VAD threshold adaptation (see subclause 3.3.5) NOISE_MAX maximum value for noise estimate (see subclause 3.3.5.2) NOISE_MIN minimum value for noise estimate (see subclause 3.3.5.2) POW_TONE_THR threshold for tone detection (see subclause 3.3.5) SP_ACTIVITY_COUNT constant
39、 for speech estimation (see subclause 3.3.5.3) SP_ALPHA_DOWN constant for speech estimation (see subclause 3.3.5.3) SP_ALPHA_UP constant for speech estimation (see subclause 3.3.5.3) SP_CH_MAX constant for VAD threshold adaptation (see subclause 3.3.5) SP_CH_MIN constant for VAD threshold adaptation
40、 (see subclause 3.3.5) SP_EST_COUNT constant for speech estimation (see subclause 3.3.5.3) SP_P1 constant for VAD threshold adaptation (see subclause 3.3.5) SP_SLOPE constant for VAD threshold adaptation (see subclause 3.3.5) STAT_COUNT threshold for stationary detection (see subclause 3.3.5.2) STAT
41、_THR threshold for stationary detection (see subclause 3.3.5.2) STAT_THR_LEVEL threshold for stationary detection (see subclause 3.3.5.2) THR_HIGH constant for VAD threshold adaptation (see subclause 3.3.5) TONE_THR threshold for tone detection (see subclause 3.3.3) VAD_POW_LOW constant for controll
42、ing VAD hangover addition (see subclause 3.3.5.1) 3.1.2.3 Functions + Addition - Subtraction * Multiplication / Division | x | absolute value of x AND Boolean AND OR Boolean ORxnnab()=() ( ) ( ) ()= + + + + +xa xa xb xb11KMIN(x,y) =xyyyxx,ETSI ETSI TS 126 194 V15.0.0 (2018-07)83GPP TS 26.194 version
43、 15.0.0 Release 153.1.3 Abbreviations For the purposes of the present document, the abbreviations given in TR 21.905 5 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905 5. ANSI American Nationa
44、l Standards Institute DTX Discontinuous Transmission VAD Voice Activity Detector CNG Comfort Noise Generation 3.2 General The function of the VAD algorithm is to indicate whether each 20 ms frame contains signals that should be transmitted, e.g. speech, music or information tones. The output of the
45、VAD algorithm is a Boolean flag (VAD_flag) indicating presence of such signals. 3.3 Functional description The block diagram of the VAD algorithm is depicted in Figure 1. The VAD algorithm uses parameters of the speech encoder to compute the Boolean VAD flag (VAD_flag). This input frame for VAD is s
46、ampled at the 6.4 kHz frequency and thus it contains 256 samples. Samples of the input frame (s(i) are divided into sub-bands and level of the signal (leveln) in each band is calculated. Input for the tone detection function are the normalized open-loop pitch gains which are calculated by open-loop
47、pitch analysis of the speech encoder. The tone detection function computes a flag (tone_flag) which indicates presence of a signalling tone, voiced speech, or other strongly periodic signal. Background noise level (bckr_estn) is estimated in each band based on the VAD decision, signal stationarity a
48、nd the tone-flag. Intermediate VAD decision is calculated by comparing input SNR (leveln/bckr_estn) to an adaptive threshold. The threshold is adapted based on noise and long term speech estimates. Finally, the VAD flag is calculated by adding hangover to the intermediate VAD decision. Filter bankan
49、dcomputationof sub-bandlevelsVADdecisionTonedetectionol_gainVAD_flaglevelntone_flags(i)Figure 1: Simplified block diagram of the VAD algorithm 3.3.1 Filter bank and computation of sub-band levels The input signal is divided into frequency bands using a 12-band filter bank (Figure 2). Cut-off frequencies for the filter bank are shown in Table 1. ETSI ETSI TS 126 194 V15.0.0 (2018-07)93GPP TS 26.194 version 15.0.0 Release 15Table 1. Cut-off frequencies for the filter bank Band number Frequencies 1 0 200 Hz 2 200 400 Hz 3 400 6
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1