ETSI TS 103 281-2017 Speech and multimedia Transmission Quality (STQ) Speech quality in the presence of background noise Objective test methods for super-wideband and fullband term.pdf

资源描述

1、 ETSI TS 103 281 V1.1.1 (2017-04) Speech and multimedia Transmission Quality (STQ); Speech quality in the presence of background noise: Objective test methods for super-wideband and fullband terminals floppy3TECHNICAL SPECIFICATION ETSI ETSI TS 103 281 V1.1.1 (2017-04)2 Reference DTS/STQ-232 Keyword

2、s noise, quality, speech, testing, transmission ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice

3、 The present document can be downloaded from: http:/www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorizati

4、on of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be awa

5、re that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https:/portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following se

6、rvices: https:/portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version

7、shall not be modified without the written authorization of ETSI. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2017. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered

8、 for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. oneM2M logo is protected for the benefit of its Members GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETSI ET

9、SI TS 103 281 V1.1.1 (2017-04)3 Contents Intellectual Property Rights 6g3Foreword . 6g3Modal verbs terminology 6g31 Scope 7g32 References 7g32.1 Normative references . 7g32.2 Informative references 8g33 Abbreviations . 10g34 Introduction 11g35 Underlying speech databases and preparations 11g36 Model

10、 descriptions . 11g36.1 Introduction 11g36.2 Common definitions . 11g36.3 Model A . 12g36.3.1 Introduction. 12g36.3.2 Pre-Processing 12g36.3.3 Spectral transformation . 13g36.3.4 Non-linear loudness transformation 16g36.3.5 Instrumental assessment of N-MOS . 16g36.3.5.1 Introduction . 16g36.3.5.2 Lo

11、udness-based features . 16g36.3.5.3 Sharpness-based feature 17g36.3.6 Reference optimization and asymmetry 18g36.3.6.1 Introduction . 18g36.3.6.2 Reference optimization . 18g36.3.6.3 Masking of inaudible differences 19g36.3.6.4 Asymmetry 19g36.3.7 Instrumental assessment of S-MOS 19g36.3.7.1 Introdu

12、ction . 19g36.3.7.2 Modulation-based features 19g36.3.7.3 Spectral difference features . 20g36.3.7.4 Control parameters 20g36.3.7.5 Combination of features 21g36.3.8 Instrumental assessment of G-MOS . 22g36.4 Model B 22g36.4.1 Overview 22g36.4.2 Operational Modes 24g36.4.3 Temporal Alignment . 24g36

13、4.4 Voice Activity Detection (VAD) and segment classification . 24g36.4.5 Auditory Model 24g36.4.5.1 Introduction . 24g36.4.5.2 Ear Canal model 25g36.4.5.3 Middle Ear model 25g36.4.5.4 Hydro-mechanical cochlear model 26g36.4.5.5 Hair Cell transduction model 26g36.4.5.6 Outer Hair motility model . 2

14、7g36.4.6 Feature Extraction . 27g36.4.6.1 Introduction . 27g36.4.6.2 Salient Formant Points (SFP) feature extraction . 28g36.4.6.3 COSM (Cochlear Output Statistic Metric) feature extraction . 29g36.4.7 Training and mapping . 30g36.5 Mapping of model outputs . 31g3ETSI ETSI TS 103 281 V1.1.1 (2017-04

15、)4 7 Comparison of objective and subjective results after the training process . 31g37.1 Introduction 31g37.2 Results for Model A . 31g37.3 Results for Cochlear Prediction Model (Model B) . 37g38 Validation results 40g38.1 Introduction 40g38.2 Validation database 1 (DES-17) . 41g38.2.1 Database desc

16、ription . 41g38.2.2 Validation database 1: Results for model B 41g38.3 Validation database 2 (DES-20) . 43g38.3.1 Database description . 43g38.3.2 Validation database 2: Results for model A 43g38.4 Validation database 3 (DES-25) . 45g38.4.1 Database description . 45g38.4.2 Validation database 3: Res

17、ults for model A 46g38.4.3 Validation database 3: Results for model B 47g38.5 Validation database 4 (DES-26) . 49g38.5.1 Database description . 49g38.5.2 Validation database 4: Results for model A 49g38.5.3 Validation database 4: Results for model B 51g38.6 Validation database 5 (DES-27) . 53g38.6.1

18、 Database description . 53g38.6.2 Validation database 5: Results for model A 54g38.6.3 Validation database 5: Results for model B 56g39 Application of the models 58g39.1 Introduction 58g39.2 Speech material 58g39.3 Positioning of the device under test 58g39.4 Background noise playback 59g39.5 Record

19、ing and calibration procedure 59g39.6 Running the prediction models . 59g3Annex A (normative): Model configuration files . 60g3A.1 Introduction 60g3A.2 Model A 60g3A.3 Model B 60g3Annex B (normative): Summary of Training Databases 62g3Annex C (normative): Test vectors for model verification . 64g3An

20、nex D (informative): Subjective testing framework . 65g3D.1 Introduction 65g3D.2 Subjective test plan . 65g3D.2.1 Traceability. 65g3D.2.2 Speech database requirements 65g3D.2.3 Reference Conditions . 65g3D.2.4 Test Conditions 65g3D.2.5 Post-processing of test conditions 66g3D.2.6 Calibration and equ

21、alization of headphones for presentation . 67g3D.2.7 Requirements on the listening laboratory . 67g3D.2.8 Experimental design . 68g3D.2.9 Training session 68g3D.3 Set-up for acquisition of test conditions . 68g3D.3.1 Terminal positioning and HATS calibration 68g3D.3.2 Background Noise reproduction .

22、 69g3D.3.3 Noise and speech playback synchronization 69g3ETSI ETSI TS 103 281 V1.1.1 (2017-04)5 D.3.4 Convergence sequence . 69g3D.3.5 Example of noise and speech playback sequence including convergence period 69g3D.3.6 Recordings at the network simulator electrical reference point 70g3D.3.7 Recordi

23、ngs at the MRP and terminals primary microphone location 70g3Annex E (normative): Speech material to be used for objective testing . 71g3History 73g3ETSI ETSI TS 103 281 V1.1.1 (2017-04)6 Intellectual Property Rights IPRs essential or potentially essential to the present document may have been decla

24、red to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which

25、is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (https:/ipr.etsi.org/). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in E

26、TSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document. Foreword This Technical Specification (TS) has been produced by ETSI Technical Committee Speech and multimedia Transmission Quality (STQ). The present document is to be used

27、 in conjunction with: ETSI ES 202 396-1 i.1: “Background noise simulation technique and background noise database“; and ETSI TS 103 224 i.19 series: “A sound field reproduction method for terminal testing including a background noise database“. The present document describes an objective test method

28、 for super-wideband and fullband in order to provide a good prediction of the uplink speech quality in the presence of background noise of modern mobile terminals in hand-held and hands-free. Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“, “may“, “need n

29、ot“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation. ETSI ETSI TS 103 281 V1.1.1 (2017-0

30、4)7 1 Scope The present document describes testing methodologies which can be used to objectively evaluate the performance of super-wideband and fullband mobile terminals for speech communication in the presence of background noise. Background noise is a problem in mostly all situations and conditio

31、ns and needs to be taken into account in terminal design. The present document provides information about the testing methods applicable to objectively evaluate the speech quality of mobile terminals (including any state-of-the-art codecs) employing background noise suppression in the presence of ba

32、ckground noise. The present document includes: The method which is applicable to objectively determine the different parameters influencing the speech quality in the presence of background noise taking into account: - the speech quality; - the background noise transmission quality; - the overall qua

33、lity. The model results in comparison with the underlying subjective tests used for the training of the objective model. The underlying languages are: American English, German, Chinese (Mandarin). The model validation results. The present document is to be used in conjunction with: ETSI ES 202 396-1

34、 i.1 which describes a recording and reproduction setup for realistic simulation of background noise scenarios in lab-type environments for the performance evaluation of terminals and communication systems. ETSI TS 103 224 i.19 which describes a sound field reproduction method for terminal testing i

35、ncluding a background noise database with background noise scenarios to be used in lab-type environments for the performance evaluation of terminals and communication systems. American English speech sentences as enclosed in the present document. 2 References 2.1 Normative references References are

36、either specific (identified by date of publication and/or edition number or version number) or non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the referenced document (including any amendments) applies. Referenced documents wh

37、ich are not found to be publicly available in the expected location might be found at https:/docbox.etsi.org/Reference/. NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee their long term validity. The following referenced documents are ne

38、cessary for the application of the present document. Not applicable. ETSI ETSI TS 103 281 V1.1.1 (2017-04)8 2.2 Informative references References are either specific (identified by date of publication and/or edition number or version number) or non-specific. For specific references, only the cited v

39、ersion applies. For non-specific references, the latest version of the referenced document (including any amendments) applies. NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee their long term validity. The following referenced documents

40、are not necessary for the application of the present document but they assist the user with regard to a particular subject area. i.1 ETSI ES 202 396-1: “Speech and multimedia Transmission Quality (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulati

41、on technique and background noise database“. i.2 ETSI EG 202 396-3: “Speech and multimedia Transmission Quality (STQ); Speech Quality performance in the presence of background noise Part 3: Background noise transmission - Objective test methods“. i.3 ETSI TS 103 106: “ Speech and multimedia Transmis

42、sion Quality (STQ); Speech quality performance in the presence of background noise: Background noise transmission for mobile terminals-objective test methods“. i.4 ETSI TS 126 441: “Universal Mobile Telecommunications System (UMTS); LTE; Codec for Enhanced Voice Services (EVS); General overview (3GP

43、P TS 26.441)“. i.5 Recommendation ITU-T P.835: “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm“. i.6 Internet Engineering Task Force, Request for Comments 6716: “Definition of the Opus Audio Codec“, 09/2012. i.7 Recommendation ITU-T P

44、56: “Objective measurement of active speech level“. i.8 Recommendation ITU-T P.1401: “Methods, metrics and procedures for statistical evaluation, qualifying and comparison of objective quality prediction models“. i.9 Recommendation ITU-T G.160 Appendix II, Amendment 2: “Voice enhancement devices: R

45、evised Appendix II - Objective measures for the characterization of the basic functioning of noise reduction algorithms“. i.10 Recommendation ITU-T P.501: “Test Signals for Use in Telephonometry“. i.11 Recommendation ITU-T P.58: “Head and Torso simulator for telephonometry“. i.12 Recommendation ITU-

46、T P.57: “Artificial ears“. i.13 Recommendation ITU-T P.800: “Methods for subjective determination of transmission quality“. i.14 ETSI TS 126 132: “Universal Mobile Telecommunications System (UMTS); LTE; Speech and video telephony terminal acoustic test specification (3GPP TS 26.132)“. i.15 Recommend

47、ation ITU-T TD 477 (GEN/12): “Handbook of subjective test practical procedures“ (temporary document) - Geneva, 18-27 January 2011. i.16 AH-11-029, Better Reference System for the P.835 SIG Rating Scale, Q7/12 Rapporteurs meeting, 20-21 June 2011, Geneva, Switzerland. i.17 3GPP, Tdoc S4(16)0397: “DES

48、UDAPS-1: Common subjective testing framework for training and validation of SWB and FB P.835 test predictors“. i.18 Recommendation ITU-T P.64: “Determination of sensitivity/frequency characteristics of local telephone systems“. ETSI ETSI TS 103 281 V1.1.1 (2017-04)9 i.19 ETSI TS 103 224: “Speech and

49、 multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise database“. i.20 Sottek, R.: “Modelle zur Signalverarbeitung im menschlichen Gehr“, PHD thesis RWTH Aachen, 1993. i.21 Sottek, R.: “A Hearing Model Approach to Time-Varying Loudness“, Acta Acustica united with Acustica, vol. 102(4), pp. 725-744, 2016. i.22 Byrne, D. et al.: “An international comparison of long-term average speech spectra“, The Journal of the Acoustical Society of Ameri

展开阅读全文