ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf

上传人:fatcommittee260 文档编号:730888 上传时间:2019-01-08 格式:PDF 页数:93 大小:847.17KB
下载 相关 举报
ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf_第1页
第1页 / 共93页
ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf_第2页
第2页 / 共93页
ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf_第3页
第3页 / 共93页
ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf_第4页
第4页 / 共93页
ETSI ES 202 212-2005 Speech Processing Transmission and Quality Aspects (STQ) Distributed speech recognition Extended advanced front-end feature extraction algorithm Compression al_1.pdf_第5页
第5页 / 共93页
点击查看更多>>
资源描述

1、 ETSI ES 202 212 V1.1.2 (2005-11)ETSI Standard Speech Processing, Transmission and Quality Aspects (STQ);Distributed speech recognition;Extended advanced front-end feature extraction algorithm;Compression algorithms;Back-end speech reconstruction algorithmfloppy3 ETSI ETSI ES 202 212 V1.1.2 (2005-11

2、) 2 Reference RES/STQ-00084a Keywords performance, speech, transmission ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N

3、7803/88 Important notice Individual copies of the present document can be downloaded from: http:/www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference v

4、ersion is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of

5、status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http:/portal.etsi.org/chaircor/ETSI_support.asp Copyright Notif

6、ication No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2005. All rights reserved. DECTTM, PLUGTESTSTM and UMTSTM are Trade Marks of ETSI registered

7、for the benefit of its Members. TIPHONTMand the TIPHON logo are Trade Marks currently being registered by ETSI for the benefit of its Members. 3GPPTM is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. ETSI ETSI ES 202 212 V1.1.2 (2005-11) 3 Con

8、tents Intellectual Property Rights6 Foreword.6 Introduction 6 1 Scope 7 2 References 8 3 Definitions, symbols and abbreviations .8 3.1 Definitions8 3.2 Symbols9 3.3 Abbreviations .10 4 System overview 11 5 Feature extraction description 12 5.1 Noise reduction 12 5.1.1 Two stage mel-warped Wiener fil

9、ter approach.12 5.1.2 Buffering.13 5.1.3 Spectrum estimation .13 5.1.4 Power spectral density mean.14 5.1.5 Wiener filter design 15 5.1.6 VAD for noise estimation (VADNest)16 5.1.7 Mel filter-bank18 5.1.8 Gain factorization .19 5.1.9 Mel IDCT .20 5.1.10 Apply filter21 5.1.11 Offset compensation .21

10、5.2 Waveform Processing.22 5.3 Cepstrum Calculation.23 5.3.1 Log energy calculation23 5.3.2 Pre-emphasis (PE) 23 5.3.3 Windowing (W)23 5.3.4 Fourier transform (FFT) and power spectrum estimation.23 5.3.5 Mel Filtering (MEL-FB).24 5.3.6 Non-linear transformation (Log).25 5.3.7 Cepstral coefficients (

11、DCT)25 5.3.8 Cepstrum calculation output .26 5.4 Blind equalization.26 5.5 Extension to 11 kHz and 16 kHz sampling frequencies .26 5.5.1 FFT-based spectrum estimation26 5.5.2 Mel Filter-Bank 28 5.5.3 High-frequency band coding and decoding 28 5.5.4 VAD for noise estimation and spectral subtraction i

12、n high-frequency bands.29 5.5.5 Merging spectral subtraction bands with decoded bands30 5.5.6 Log energy calculation for 16 kHz .31 5.6 Pitch and class estimation.32 5.6.1 Spectrum and energy computation32 5.6.2 Voice Activity Detection for Voicing Classification (VADVC) 33 5.6.3 Low-band noise dete

13、ction.38 5.6.4 Pre-Processing for pitch and class estimation.38 5.6.5 Pitch estimation 39 5.6.5.1 Dirichlet interpolation .40 5.6.5.2 Non-speech and low-energy frames42 5.6.5.3 Search ranges specification and processing 42 5.6.5.4 Spectral peaks determination 42 5.6.5.5 F0 Candidates generation44 5.

14、6.5.6 Computing correlation scores46 ETSI ETSI ES 202 212 V1.1.2 (2005-11) 4 5.6.5.7 Pitch estimate selection.48 5.6.5.8 History information update .50 5.6.5.9 Output pitch value.51 5.6.6 Classification 51 6 Feature compression.52 6.1 Introduction 52 6.2 Compression algorithm description52 6.2.1 Inp

15、ut52 6.2.2 Vector quantization.52 6.2.3 Pitch and class quantization53 6.2.3.1 Class quantization .53 6.2.3.2 Pitch quantization54 7 Framing, bit-stream formatting and error protection55 7.1 Introduction 55 7.2 Algorithm description.56 7.2.1 Multiframe format 56 7.2.2 Synchronization sequence.56 7.2

16、3 Header field 56 7.2.4 Frame packet stream .58 8 Bit-stream decoding and error mitigation.58 8.1 Introduction 58 8.2 Algorithm description.58 8.2.1 Synchronization sequence detection .58 8.2.2 Header decoding .59 8.2.3 Feature decompression .59 8.2.4 Error mitigation 59 8.2.4.1 Detection of frames

17、 received with errors 59 8.2.4.2 Substitution of parameter values for frames received with errors.60 8.2.4.3 Modification of parameter values for frames received with errors .60 9 Server feature processing .63 9.1 lnE and c(0) combination .63 9.2 Derivatives calculation.63 9.3 Feature vector selecti

18、on63 10 Server side speech reconstruction 64 10.1 Introduction 64 10.2 Algorithm description.64 10.2.1 Speech reconstruction block diagram .64 10.2.2 Pitch Tracking and Smoothing65 10.2.2.1 First stage - gross pitch error correction66 10.2.2.2 Second stage - voiced/unvoiced decision and other correc

19、tions .68 10.2.2.3 Third stage - smoothing 69 10.2.2.4 Voicing class correction69 10.2.3 Harmonic Structure Initialization .70 10.2.4 Unvoiced phase synthesis .70 10.2.5 Cepstra de-equalization.70 10.2.6 Transformation of features extracted at 16 kHz71 10.2.7 Harmonic magnitudes reconstruction .71 1

20、0.2.7.1 High order cepstra recovery 71 10.2.7.2 Solving front-end equation73 10.2.7.3 Cepstra to magnitudes transformation.77 10.2.7.4 Combined magnitudes estimate calculation 79 10.2.7.4.1 Combined magnitude estimate for unvoiced harmonics79 10.2.7.4.2 Combined magnitude estimate for voiced harmoni

21、cs80 10.2.8 All-pole spectral envelope modelling .81 10.2.9 Postfiltering.83 10.2.10 Voiced phase synthesis .84 10.2.11 Line spectrum to time-domain transformation86 10.2.11.1 Mixed-voiced frames processing 86 10.2.11.2 Filtering very high-frequency harmonics 86 ETSI ETSI ES 202 212 V1.1.2 (2005-11)

22、 5 10.2.11.3 Energy normalization87 10.2.11.4 STFT spectrum synthesis 87 10.2.11.5 Inverse FFT.87 10.2.12 Overlap-Add .88 Annex A (informative): Voice Activity Detection (VAD)89 A.1 Introduction 89 A.2 Stage 1 - Detection .89 A.3 Stage 2 - VAD Logic90 Annex B (informative): Bibliography.92 History 9

23、3 ETSI ETSI ES 202 212 V1.1.2 (2005-11) 6 Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found

24、in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (http:/webapp.etsi.org/IPR/home.asp). Pursuant to the

25、ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Forew

26、ord This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ). Introduction The performance of speech recognition systems receiving speech that has been transmitted over mobile channels can be significantly degraded when compared

27、to using an unmodified signal. The degradations are as a result of both the low bit rate speech coding and channel transmission errors. A Distributed Speech Recognition (DSR) system overcomes these problems by eliminating the speech channel and instead using an error protected data channel to send a

28、 parameterized representation of the speech, which is suitable for recognition. The processing is distributed between the terminal and the network. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data ch

29、annel to a remote “back-end“ recognizer. The end result is that the degradation in performance due to transcoding on the voice channel is removed and channel invariability is achieved. The present document presents a standard for a front-end to ensure compatibility between the terminal and the remot

30、e recognizer. The first ETSI standard DSR front-end ES 201 108 1 was published in February 2000 and is based on the Mel-Cepstrum representation that has been used extensively in speech recognition systems. This second standard is for an Advanced DSR front-end that provides substantially improved rec

31、ognition performance in background noise. Evaluation of the performance during the selection of the present document showed an average of 53 % reduction in speech recognition error rates in noise compared to ES 201 108 1. For some applications, it may be necessary to reconstruct the speech waveform

32、at the back-end. Examples include: Interactive Voice Response (IVR) services based on the DSR of “sensitive“ information, such as banking and brokerage transactions. DSR features may be stored for future human verification purposes or to satisfy procedural requirements. Human verification of utteran

33、ces in a speech database collected from a deployed DSR system. This database can then be used to retrain and tune models in order to improve system performance. Applications where machine and human recognition are mixed (e.g. human assisted dictation). In order to enable the reconstruction of speech

34、 waveform at the back-end, additional parameters such as fundamental frequency (F0) and voicing class need to be extracted at the front-end, compressed, and transmitted. The availability of tonal parameters (F0 and voicing class) is also useful in enhancing the recognition accuracy of tonal language

35、s, e.g. Mandarin, Cantonese, and Thai. The present document specifies a proposed standard for an Extended Advanced Front-End (XAFE) that extends the noise-robust advanced front-end with additional parameters, viz., fundamental frequency F0 and voicing class. It also specifies the back-end speech rec

36、onstruction algorithm using the transmitted parameters. ETSI ETSI ES 202 212 V1.1.2 (2005-11) 7 1 Scope The present document specifies algorithms for extended advanced front-end feature extraction, their transmission, back-end pitch tracking and smoothing, and back-end speech reconstruction which fo

37、rm part of a system for distributed speech recognition. The specification covers the following components: a) the algorithm for advanced front-end feature extraction to create Mel-Cepstrum parameters; b) the algorithm for extraction of additional parameters, viz., fundamental frequency F0 and voicin

38、g class; c) the algorithm to compress these features to provide a lower data transmission rate; d) the formatting of these features with error protection into a bitstream for transmission; e) the decoding of the bitstream to generate the advanced front-end features at a receiver together with the as

39、sociated algorithms for channel error mitigation; f) the algorithm for pitch tracking and smoothing at the back-end to minimize pitch errors; g) the algorithm for speech reconstruction at the back-end to synthesize intelligible speech. NOTE: The components a), c), d) and e) are already covered by th

40、e ES 202 050 2. Besides these (four) components, the present document covers the components b), f) and g) to provide back-end speech reconstruction and enhanced tonal language recognition capabilities. If these capabilities are not of interest, the reader is better served by (un-extended) ES 202 050

41、 2. The present document does not cover the “back-end“ speech recognition algorithms that make use of the received DSR advanced front-end features. The algorithms are defined in a mathematical form, pseudo-code, or as flow diagrams. Software implementing these algorithms written in the C programming

42、 language is contained in the ZIP file es_202212v010101p0.zip which accompanies the present document. Conformance tests are not specified as part of the standard. The recognition performance of proprietary implementations of the standard can be compared with those obtained using the reference C code

43、 on appropriate speech databases. It is anticipated that the DSR bitstream will be used as a payload in other higher level protocols when deployed in specific systems supporting DSR applications. In particular, for packet data transmission, it is anticipated that the IETF AVT RTP DSR payload definit

44、ion (see bibliography) will be used to transport DSR features using the frame pair format described in clause 7. The extended advanced DSR standard is designed for use with discontinuous transmission and to support the transmission of Voice Activity information. Annex A describes a VAD algorithm tha

45、t is recommended for use in conjunction with the Advanced DSR standard, however it is not part of the present document and manufacturers may choose to use an alternative VAD algorithm. The Extended Advanced Front-End (XAFE) incorporates tonal information, viz., fundamental frequency F0 and voicing c

46、lass, as additional parameters. This information can be used for enhancing the recognition accuracy of tonal languages, e.g. Mandarin, Cantonese, and Thai. ETSI ETSI ES 202 212 V1.1.2 (2005-11) 8 2 References The following documents contain provisions which, through reference in this text, constitut

47、e provisions of the present document. References are either specific (identified by date of publication and/or edition number or version number) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest version applies. Referenced document

48、s which are not found to be publicly available in the expected location might be found at http:/docbox.etsi.org/Reference. 1 ETSI ES 201 108: “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms“. 2

49、 ETSI ES 202 050: “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms“. 3 ETSI EN 300 903: “Digital cellular telecommunications system (Phase 2+) (GSM); Transmission planning aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system (GSM 03.50)“. 3 Definitions, symbols and abbreviations 3.1 Definitions For the purposes of the present document, the following terms and definitions apply: analog-to-digital conversion:

展开阅读全文
相关资源
猜你喜欢
  • BS AU 50-1 7 1-1989 Tyres and wheels - Tyres - Moped tyres - Specification for tyre designations dimensions and load ratings《轮胎和车轮 第1部分 轮胎 第7节 机动脚踏两用车轮胎 第1小节 轮胎牌号、尺寸和载重额定规范》.pdf BS AU 50-1 7 1-1989 Tyres and wheels - Tyres - Moped tyres - Specification for tyre designations dimensions and load ratings《轮胎和车轮 第1部分 轮胎 第7节 机动脚踏两用车轮胎 第1小节 轮胎牌号、尺寸和载重额定规范》.pdf
  • BS AU 50-1 8-1989 Tyres and wheels - Tyres - Code of practice for the storage of tyres inner tubes and flaps《轮胎和车轮 第1部分 轮胎 第8节 轮胎、内胎和垫带保存实用规程》.pdf BS AU 50-1 8-1989 Tyres and wheels - Tyres - Code of practice for the storage of tyres inner tubes and flaps《轮胎和车轮 第1部分 轮胎 第8节 轮胎、内胎和垫带保存实用规程》.pdf
  • BS AU 50-2 3-1994 Tyres and wheels Wheels and rims Specification for road-wheel nuts studs and bolts for commercial vehicles《轮胎和车轮 第2部分 车轮和轮辋 第3节 商用车辆的道路车轮螺母、双头螺栓和螺栓规范》.pdf BS AU 50-2 3-1994 Tyres and wheels Wheels and rims Specification for road-wheel nuts studs and bolts for commercial vehicles《轮胎和车轮 第2部分 车轮和轮辋 第3节 商用车辆的道路车轮螺母、双头螺栓和螺栓规范》.pdf
  • BS AU 50-2 5c-1996 Tyres and wheels - Wheels and rims - Specification for road wheels manufactured wholly or partly of cast light alloy for passenger cars《轮胎和车轮 车轮和轮辋 客车整体或部分用轻合金铸件.pdf BS AU 50-2 5c-1996 Tyres and wheels - Wheels and rims - Specification for road wheels manufactured wholly or partly of cast light alloy for passenger cars《轮胎和车轮 车轮和轮辋 客车整体或部分用轻合金铸件.pdf
  • BS AU 50-2 7a-1995 Tyres and wheels - Wheels and rims - Code of practice for the selection and care of tyres and wheels for commercial vehicles《轮胎和车轮 第2部分 车轮和轮辋 第7节 商用车辆轮胎和车轮的选择和维护.pdf BS AU 50-2 7a-1995 Tyres and wheels - Wheels and rims - Code of practice for the selection and care of tyres and wheels for commercial vehicles《轮胎和车轮 第2部分 车轮和轮辋 第7节 商用车辆轮胎和车轮的选择和维护.pdf
  • BS AU 50-3 7-1985 Tyres and wheels - Valves - Code of practice for the selection and care of tyre inflation valves for passenger cars (including caravans and light trailers) - Sect.pdf BS AU 50-3 7-1985 Tyres and wheels - Valves - Code of practice for the selection and care of tyre inflation valves for passenger cars (including caravans and light trailers) - Sect.pdf
  • BS AU 50-3 8-1985 Tyres and wheels - Valves - Code of practice for the selection and care of tyre inflation valves for commercial vehicles - Section 8a Code of practice for the sel.pdf BS AU 50-3 8-1985 Tyres and wheels - Valves - Code of practice for the selection and care of tyre inflation valves for commercial vehicles - Section 8a Code of practice for the sel.pdf
  • BS AU 50-4 5 1-1995 Tyres and wheels - Rim profiles and dimensions - Industrial vehicle rims - Specification for rim profiles and dimensions for code-designated series tyres《轮胎和车轮 .pdf BS AU 50-4 5 1-1995 Tyres and wheels - Rim profiles and dimensions - Industrial vehicle rims - Specification for rim profiles and dimensions for code-designated series tyres《轮胎和车轮 .pdf
  • BS AU 50-4 7 1-1991 Tyres and wheels - Rim profiles and dimensions - Moped rims - Specification for moped rims《轮胎和车轮 轮辋外廓和尺寸 机动脚踏两用车轮辋 机动脚踏两用车轮辋规范》.pdf BS AU 50-4 7 1-1991 Tyres and wheels - Rim profiles and dimensions - Moped rims - Specification for moped rims《轮胎和车轮 轮辋外廓和尺寸 机动脚踏两用车轮辋 机动脚踏两用车轮辋规范》.pdf
  • 相关搜索

    当前位置:首页 > 标准规范 > 国际标准 > 其他

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1