1、 ETSI TR 126 943 V15.0.0 (2018-07) Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Recognition performance evaluations of codecs for Speech Enabled Services (SES) (3GPP TR 26.943 version 15.0.0 Release 15) TECHNICAL REPORT ETSI ETS
2、I TR 126 943 V15.0.0 (2018-07)13GPP TR 26.943 version 15.0.0 Release 15Reference RTR/TSGS-0426943vf00 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non luc
3、ratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the
4、present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific networ
5、k drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors
6、in the present document, please send your comment to one of the following services: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except
7、as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. ETSI 2018. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo a
8、re trademarks of ETSI registered for the benefit of its Members. 3GPPTM and LTETMare trademarks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. oneM2M logo is protected for the benefit of its Members. GSMand the GSM logo are trademarks registered and owned
9、by the GSM Association. ETSI ETSI TR 126 943 V15.0.0 (2018-07)23GPP TR 26.943 version 15.0.0 Release 15Intellectual Property Rights Essential patents IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The information pertaining to these essential IPRs,
10、if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are availa
11、ble on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are,
12、 or may be, or may become, essential to the present document. Trademarks The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners. ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys
13、 no right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks. Foreword This Technical Report (TR) has been produced by ETSI 3rd Gene
14、ration Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP
15、 and ETSI identities can be found under http:/webapp.etsi.org/key/queryform.asp. Modal verbs terminology In the present document “should“, “should not“, “may“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms
16、 for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TR 126 943 V15.0.0 (2018-07)33GPP TR 26.943 version 15.0.0 Release 15Contents Intellectual Property Rights 2g3Foreword . 2g3Modal verbs terminology 2g3Forewor
17、d . 4g3Introduction 4g31 Scope 5g32 References 5g33 Abbreviations . 5g34 General . 6g34.1 Project History 6g34.2 Overview of the speech recognition framework for automated voice services work item . 8g34.3 Presentation of the following sections 8g35 Recommendation criteria . 8g35.1 Overview 8g35.2 S
18、coring on individual databases . 8g35.3 Performance metric over all databases . 9g35.4 Comparisons between codecs . 9g35.4.1 Low data-rate codec comparison 9g35.4.2 High data-rate codec comparison 9g35.4.2.1 8 kHz sampling rate 9g35.4.2.2 16 kHz sampling rate 9g35.5 Detailed recommendation compariso
19、ns 9g36 Performance evaluation method . 10g36.1 Introduction 10g36.2 Recognition engines . 10g36.2.1 Recognizer for speech codecs based proposals . 11g36.2.2 Training and testing 11g36.2.3 Recognizer for DSR 11g36.2.4 Training and testing 11g36.3 Usage of VAD for frame dropping . 11g36.4 Codec evalu
20、ations. 11g36.4.1 Recognition experiments under error-free channel . 11g36.5 Recognition experiments under channel errors 14g37 Recognition Performance Evaluation Results 15g3Annex A: Key selection phase documents 19g3Annex B: Change history 20g3History 21g3ETSI ETSI TR 126 943 V15.0.0 (2018-07)43GP
21、P TR 26.943 version 15.0.0 Release 15Foreword This Technical Report has been produced by the 3rdGeneration Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents o
22、f the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document
23、 under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the document. Introduction SA4 has been working on the selection of a
24、codec to recommend for Speech Enabled Services since October 2002 under the WID for SES 9. The usual process of agreeing “design constrains“ 10, “test and processing plan“ 7 and “recommendation criteria“ 8 was followed and completed before evaluating the candidates. Two candidate codecs were propose
25、d and evaluated: 1) ETSI Standard for the DSR Extended Advanced Front-end (ES 202 212) 2) AMR and AMR-WB audio codec The performance evaluations were conducted by two leading companies in the area of speech recognition, IBM and Scansoft. Results from these evaluations were presented at SA4#30 in Feb
26、ruary 2004 and are summarised here. The “recommendation criteria“ have been applied and SA4 recommends the DSR codec for Speech Enabled Services. SES codecs are introduced in packet switched conversational services in Technical Specifications 26.235 Stage 1“. 2 3GPP TR 22.977: “Feasibility study for
27、 speech enabled services“. 3 ETSI ES 202 050: “Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithm“. 4 ETSI ES 202 212: “Distributed Speech Recognition; Extended Advanced Front-end Feature Extraction Algorithm; Compression Algorithm, Back-end Speech
28、 Reconstruction Algorithm“. 5 3GPP TS 26.235: “Packet switched conversational multimedia applications; Default codecs“. 6 3GPP TS 26.236: “Packet switched conversational multimedia applications; Transport Protocols“. 7 TD S4-030543 “Test and Processing plan for default codec evaluation for speech en
29、abled services (SES)“, SA4 8 TD SP-030440 “Recommendation Criteria for Default Codec for Speech Enabled Services (SES)“, TSG SA. 9 TD SP-020687 WID Codec Work to Support Speech Recognition Framework for Automated Voice Services (Rel-6), TSG SA. 10 TD S4-030248 “Design Constraints for default codec f
30、or speech enabled services (SES)“, SA4. Note: Annex A lists all the key SA4 SES selection phase documents. Temporary Documents are attached to this specification in a separate .zip file. 3 Abbreviations For the purposes of the present document, the following abbreviations apply: AFE Advanced Front-e
31、nd AMR Adaptive Multi-Rate AMR-NB AMR Narrowband AMR-WB AMR Wideband BLER Block Error Rate DSR Distributed Speech Recognition EDGE Enhanced Data for GSM Evolution ETSI European Telecommunications Standards Institute ETSI ETSI TR 126 943 V15.0.0 (2018-07)63GPP TR 26.943 version 15.0.0 Release 15GSM G
32、lobal System for Mobile communications SES Speech Enabled Services SNR Signal To Noise Ratio VAD Voice Activity Detector X-AFE eXtended Advanced Front-end 4 General 4.1 Project History Table 1 below shows the progress and timeline of the project. In particular the creation of permanent documents; id
33、entification of candidate codecs and test organisations; running of the performance evaluations by test organisations; selection at SA4; verification; and the approval of CRs and TS at SA. Key milestones are highlighted in bold. Table 1: SES project timeline Meeting Status of progress in activities
34、SA4 #23 (30 Sept - 4 Oct 2002) g131 Draft WID and work plan SA4 #24 (11-15 Nov 2002) Permanent documents o Design Constraints V1.0 o Test & Processing Plan V0.8 o Recommendation Criteria V0.1 Intermediate deadline on SA4 reflector 31.12.2002 Submission of specification of additional databases as can
35、didate for testing as part of test and processing plan. Intermediate deadline on SA4 reflector 31.12.2002 g131 Any company which would possibly like to submit a candidate will indicate before 31.12.2002. Later indications will not be considered. SA4 #25 (20-24 Jan 2003) g131 List of testing organisa
36、tions g131 Permanent documents o Design Constraints V1.1 o Test Plan & Processing Plan V1.0 o Recommendation Criteria V0.3 SA4 #25 bis (24-28 Feb 2003) g131 List of testing organisations (IBM & SpeechWorks) g131 List of candidate codecs (DSR X-AFE & AMR-NB/AMR-WB) g131 Permanent documents o Design C
37、onstraints V2.0 o Test Plan & Processing Plan V1.3 o Recommendation Criteria V0.3 ETSI ETSI TR 126 943 V15.0.0 (2018-07)73GPP TR 26.943 version 15.0.0 Release 15SA4 SQ SES ad-hoc 1-2 April 2003 Basingstoke, UK g131 Permanent documents o Test & Processing Plan V1.4 o Recommendation Criteria V0.3 SA4
38、#26 (5-9 May 2003) g131 Permanent documents o Test & Processing Plan V2.0 o Recommendation Criteria V0.6 SA4 #27 (7-11 July 2003) Approval of permanent docs o Test & Processing Plan V2.2 o Recommendation Criteria V2.0 ASR vendor evaluations start. Aug 2003 g131 ASR vendors start tests. Deliverables
39、from candidates: (31 October 2003) g131 Fixed point complexity assessment g131 Drafts of new 3GPP TSs (for new codecs), or existing specifications for information (codecs already in standards) g131 Justification document of having met the Design Constraints SA4 #29 (24-28 Nov 2003) g131 Preparation
40、for verification g131 Agree verification plan by correspondence (19 Dec) g131 Complete any legal agreements (NDAs) that are needed (15 Feb) g131 Verification labs to obtain any databases needed (15 Feb) Informative speech quality listening tests g131 Nokia and Ericsson to supply listening test speec
41、h files to Motorola (5thDec) g131 Motorola to process listening test speech files supplied by Nokia and Ericsson (15 Jan) g131 Nokia and Ericsson conduct listening tests Completion of ASR vendor evaluations (31 Jan 2004) g131 Results from ASR vendor evaluations to ETSI representative SA4 #30 (23-27
42、Feb 2004) SES Selection meeting g131 Results from evaluator tests available g131 Make recommendation g131 Prepare TSs for approval SA#23 g131 Prepare CRs for approval SA#23 SES Verification (1 March) g131 Verification of selected codec (ST-Micro). ETSI ETSI TR 126 943 V15.0.0 (2018-07)83GPP TR 26.94
43、3 version 15.0.0 Release 15g131 Discussion of results of verification conference call March. SA #23 (15-17 March 2004) g131 TSs for information g131 CRs for information SA4 #31 (17-21 May 2004) Verification report SA #24 (7-10 June 2004) g131 TSs approval (TS 26.243) CRs approval (TS 26.235 & TS 26.
44、236) 4.2 Overview of the speech recognition framework for automated voice services work item The work item covered the evaluation of candidate codecs for use in a speech recognition framework for automated voice services. The 3GPP speech recognition framework enables the use of conventional codecs (
45、e.g. AMR) or DSR optimised codecs to distribute in the network the speech engines that process speech input or generate speech output. The aim of the work item is, through objective evaluation, to recommend a single codec for speech enabled services based on a speech recognition framework. 4.3 Prese
46、ntation of the following sections The following sections provide a summary of the Selection Phase test results, including the results of the objective performance measurements, and a record of other relevant information for the selected candidate algorithm. - Section 5 describes the Recommendation C
47、riteria defined for the Selection Phase - Section 6 defines the means used to measure the performance of each of the candidates - Section 7 summarises the recognition evaluation results 5 Recommendation criteria 5.1 Overview The set of databases used for the evaluations are defined in the Test and P
48、rocessing Plan 7. Each of these databases contains different types of speech material covering a variety of tasks, environments and languages. Recommendation was based on a score obtained from the recognition performance measured on each of these different databases. Section 5.3 describes how the sc
49、ores from all the individual databases are combined using a weighting table. 5.2 Scoring on individual databases For each database the reference performance is measured as the word error rate obtained from the ASR vendors system. This is the performance obtained from a state-of-the-art system from the ASR vendor assuming a transparent channel. The performance (word error rate) on a given database is also measured with the ASR vendors system for a codec under test as described in the test and processing plan 7. Scoring for tests pe