1、 ETSI TR 102 949 V1.1.1 (2014-09) Speech and multimedia Transmission Quality (STQ); Wideband and Superwideband speech terminals; Perceptually motivated parameters TECHNICAL REPORT ETSI ETSI TR 102 949 V1.1.1 (2014-09) 2Reference DTR/STQ-183 Keywords loudness, speech, superwideband, terminal, wideban
2、d ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice The present document can be downloaded from:
3、http:/www.etsi.org The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in
4、contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of st
5、atus. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http:/portal.etsi.org/chaircor/ETSI_support.asp Copyright Notific
6、ation No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. The content of the PDF version shall not be modified without the written authorization of ETSI. The copyright an
7、d the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2014. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registere
8、d for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETSI ETSI TR 102 949 V1.1.1 (2014-09) 3Contents Intellectual Property Rights 4g3Foreword . 4g3Modal verbs terminology 4g3Introduction 4g31 Scope
9、 5g32 References 5g32.1 Normative references . 5g32.2 Informative references 5g33 Definitions, symbols and abbreviations . 7g33.1 Definitions 7g33.2 Symbols 8g33.3 Abbreviations . 8g34 Sound levels and loudness 9g34.1 Loudness. 9g34.2 Impact of signal level and spectrum (including pitch and frequenc
10、y adjustment and balance) 10g35 Speech/Sound Quality and Intelligiblility 10g35.1 Speech intelligibility assessment 10g35.2 Impacts of impairments on speech intelligibility 11g35.3 Other quality parameters 11g35.3.1 Audio clarity . 11g35.3.2 Naturalness . 11g3Annex A: Considerations about loudness a
11、ssessment . 12g3Annex B: Objective and subjective tests: Influence of frequency bandwidth on loudness . 16g3B.1 Loudness depending on bandwidth and codec . 16g3B.1.1 Simulation process . 16g3B.1.2 Results presentation 19g3B.1.2.1 Level depending on bandwidth . 19g3B.1.2.2 Level depending on codec 21
12、g3B.1.2.3 Loudness depending on bandwidth . 22g3B.1.2.4 Loudness depending on codec 23g3B.2 Subjective Test results 25g3B.2.1 Introduction 25g3B.2.2 Selection and preparation of test signals 25g3B.2.3 Description of the subjective test . 28g3B.2.3.1 Description of the response scale 28g3B.2.3.2 Cali
13、bration of the sound reproduction chain . 28g3B.2.4 First stage of the subjective test: Measurement of individual loudness function . 29g3B.2.4.1 Dynamic range determination . 29g3B.2.4.2 Measurement of individual loudness function 30g3B.2.4.3 Results for individual loudness functions . 31g3B.2.5 Se
14、cond stage of the subjective test: Assessment of test signal loudness 31g3B.2.5.1 Assessment of test signal loudness . 31g3B.2.5.2 Conversion from points to phons 32g3B.2.6 Results for test signal loudness . 33g3B.2.6.1 Results averaged over all samples 33g3B.2.6.2 Detailed results per sample . 34g3
15、B.2.6.3 Results averaged over all samples, except Sample 4 36g3Annex C: Bibliography 38g3History 39 ETSI ETSI TR 102 949 V1.1.1 (2014-09) 4Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these
16、essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest up
17、dates are available on the ETSI Web server (http:/ipr.etsi.org). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web serv
18、er) which are, or may be, or may become, essential to the present document. Foreword This Technical Report (TR) has been produced by ETSI Technical Committee Speech and multimedia Transmission Quality (STQ). Modal verbs terminology In the present document “shall“, “shall not“, “should“, “should not“
19、, “may“, “may not“, “need“, “need not“, “will“, “will not“, “can“ and “cannot“ are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions). “must“ and “must not“ are NOT allowed in ETSI deliverables except when used in direct citation.
20、Introduction There are in practice a lot of factors that may affect the quality and usability of terminals in real use, including the users behaviour, such as the real positioning of the terminal relative to ear(s), the influence of the distance and of the environment (noise, reverberation) the real
21、 voice level of the distant speaker, etc. The present document is intended to provide initial answers to questions raised: - on the potential impact of speech spectrum and speech level on loudness; - about differences perceived by the distant user when the local user uses alternatively different pic
22、k-up systems. Technical reports on accessibility have shown that speech quality degradation may affect more strongly people with hearing impairments. Hence it appears that it is needed to consider other criteria than overall quality (e.g. intelligibility or clarity) and to consider the potential imp
23、act of loudness. ETSI ETSI TR 102 949 V1.1.1 (2014-09) 51 Scope The present document investigates new perceptually motivated parameters defining more closely the audio quality, such as loudness, fidelity and intelligibility of the speech as perceived by the user, for wideband and superwideband speec
24、h terminals. The annexes detail studies about loudness of received signals, depending on the transmission bandwidths, the codecs, the types of transmitted signals and compare results from different computation models. The intention of the present document is to provide alternative or new quality par
25、ameters and test methods to be implemented in the relevant standards and specifications. 2 References References are either specific (identified by date of publication and/or edition number or version number) or non-specific. For specific references, only the cited version applies. For non-specific
26、references, the latest version of the reference document (including any amendments) applies. Referenced documents which are not found to be publicly available in the expected location might be found at http:/docbox.etsi.org/Reference. NOTE: While any hyperlinks included in this clause were valid at
27、the time of publication, ETSI cannot guarantee their long term validity. 2.1 Normative references The following referenced documents are necessary for the application of the present document. Not applicable. 2.2 Informative references The following referenced documents are not necessary for the appl
28、ication of the present document but they assist the user with regard to a particular subject area. i.1 ETSI ES 202 739: “Speech and multimediaTransmission Quality (STQ); Transmission requirements for wideband VoIP terminals (handset and headset) from a QoS perspective as perceived by the user“. i.2
29、ETSI ES 202 740: “Speech and multimedia Transmission Quality (STQ); Transmission requirements for wideband VoIP loudspeaking and handsfree terminals from a QoS perspective as perceived by the user“. i.3 ETSI TS 103 739: “Speech and multimedia Transmission Quality (STQ); Transmission requirements for
30、 wideband wireless terminals (handset and headset) from a QoS perspective as perceived by the user“. i.4 ETSI TS 103 740: “Speech and multimedia Transmission Quality (STQ);Transmission requirements for wideband wireless terminals (handsfree) from a QoS perspective as perceived by the user“. i.5 ETSI
31、 ETS 300 807: “Integrated Services Digital Network (ISDN); Audio characteristics of terminals designed to support conference services in the ISDN“. i.6 Recommendation ITU-T P.79: “Calculation of loudness ratings for telephone sets“. i.7 Recommendation ITU-T P.58: “Head and torso simulator for teleph
32、onometry“. ETSI ETSI TR 102 949 V1.1.1 (2014-09) 6i.8 Recommendation ITU-T P.581: “Use of head and torso simulator (HATS) for hands-free and handset terminal testing“. i.9 Recommendation ITU-T P.501: “Test signals for use in telephonometry“. i.10 Recommendation ITU-T P.863: “Perceptual objective lis
33、tening quality assessment“. i.11 Recommendation ITU-T P.10/G.100: “Vocabulary for performance and quality of service“. i.12 ANSI 53.4-2007: “American National Standard procedure for the computation of loudness of steady sound“. i.13 DIN 45631, 1991: “Procedures for calculating loudness level Transmi
34、ssion requirements for Superwideband/Fullband headset terminals from a QoS perspective as perceived by the user“. i.17 ETSI TS 102 925: “Speech and multimedia Transmission Quality (STQ); Transmission requirements for Superwideband/Fullband handsfree and conferencing terminals from a QoS perspective
35、as perceived by the user“. i.18 ISO TR 22411: “Ergonomics data and guidelines for the application of ISO/IEC Guide 71 to products and services to address the needs of older persons and persons with disabilities“. i.19 Recommendation ITU-T G.711: “Pulse Code Modulation (PCM) of Voice Frequencies“. i.
36、20 Recommendation ITU-T G.722: “7 kHz audio-coding within 64 kbit/s“. i.21 ETSI ES 203 038: “Speech and multimedia Transmission Quality (STQ); Requirements and tests methods for terminal equipment incorporating a handset when connected to the analogue interface of the PSTN“. i.22 Recommendation ITU-
37、T P.50: “Artificial voices“. i.23 ANSI/ASA S3.5-1997 (R 2012) American National Standard: “Methods for Calculation of the Speech Intelligibility Index“. i.24 Recommendation ITU-T P.862: “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of n
38、arrow-band telephone networks and speech codecs“. i.25 Meunier S. and al.: “Calcul des indicateurs de sonie: revue des algorithmes et implmentation“, 10me Congrs Franais dAcoustique (2010). i.26 Zwicker E. and Fastl H.: “Psychoacoustics: Facts and models“, 2nd Edition, Springer-Verlag, Berlin (1999)
39、. i.27 Glasberg B. R. and Moore B. C. J.:“A model of loudness application to time-varying sounds“, J. Audio Eng. Soc, Vol. 50, n 5, 331-342 (2002). i.28 Sridhar Kalluri, Starkey Hearing Research Center (Berkeley, USA): “High frequency sound for the hearing impaired“, ITU-T Workshop on “From Speech t
40、o Audio: bandwidth extension, binaural perception“ Lannion, France, 10-12 September 2008. i.29 Ute Jekosch. TU Dresden: “Test on overall quality as perceived by high frequency hearing impaired subscribers“, ITU-T SG12 - C101- September 2007. i.30 Cyril Plapous, Jean-Yves Le Saout, Jean-Yves Monfort:
41、 “Loudness depending on bandwidth and Codec“. ETSI STQ(13)42-029r1. ETSI ETSI TR 102 949 V1.1.1 (2014-09) 7i.31 John Beerends, Ronald Van Buuren, Jeroen Van Vugt and Jan Verhave: “Objective Speech Intelligibility Measurement on the basis of natural speech in combination with perceptual modeling“. JA
42、ES, Vol.57, N 5, 2009 May. i.32 Sren Jrgensen and Torsten Dau: “Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing“, J. Acoust. Soc. Am. Volume 130, Issue 3, pp. 1475-1487 (2011); (13 pages). i.33 Jianfen Ma, Yi Hu and
43、Philipos C. Loizou: “Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions“, J. Acoust Soc Am. 2009 May; 125(5): pp. 3387-3405. i.34 Jean-Yves Monfort, JYMC.I.S.: “Status of Speech intelligibility studies and models for hearing impaired p
44、eople. Plans for standards“. NOTE: Available at: http:/docbox.etsi.org/Workshop/2014/201406_HFWORKSHOP/S02_Speech_Intelligibility/S02_Monfort_JYMLCIS.pdf i.35 Ewert and Dau: “Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective proc
45、essing“, J. Acoust. Soc. Am. 108, pp. 1181-1196 (2000). i.36 ANSI S3.2-1989: “American National Standard Method for Measuring the Intelligibility of Speech over Communication Systems“. i.37 Recommendation ITU-T G.729.1 (Annex E): “G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable
46、 wideband coder bitstream interoperable with G.729“. i.38 Recommendation ITU-T G.722.1 (Annex C): “Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss“. i.39 Recommendation ITU-T G.719: “Low-complexity, full-band audio coding for high-quality, conversati
47、onal applications“. i.40 Recommendation ITU-T P.56: “Objective measurement of active speech level“. 3 Definitions, symbols and abbreviations 3.1 Definitions For the purposes of the present document, the following terms and definitions given in Recommendation ITU-T P.10/G.100 i.11 apply: Definitions
48、“generally used in psychoacoustics“ articulation index: A measure of the intelligibility of voice signals, expressed as a percentage of speech units that are understood by the listener when heard out of context. The articulation index is based on partially empirical, partially theoretical principles
49、 to predict the speech intelligibility under known signal-to-noise conditions. loudness: Loudness belongs to a category of intensity sensations. Loudness is that attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud. Loudness takes into account the spectral and temporal sensitivity of the human ear. Generally masking effects in time and frequency are taken into account. The loudness level measure according to Zwicker i.26 was create