1、 I n t e r n a t i o n a l T e l e c o m m u n i c a t i o n U n i o n ITU-T P.804 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2017) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods for objective and subjective assessment of speech and video qual
2、ity Subjective diagnostic test method for conversational speech quality analysis Recommendation ITU-T P.804 ITU-T P-SERIES RECOMMENDATIONS TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Vocabulary and effects of transmission parameters on customer opinion of transmissio
3、n quality Series P.10 Voice terminal characteristics Series P.30 P.300 Reference systems Series P.40 Objective measuring apparatus Series P.50 P.500 Objective electro-acoustical measurements Series P.60 Measurements related to speech loudness Series P.70 Methods for objective and subjective assessme
4、nt of speech quality Series P.80 Methods for objective and subjective assessment of speech and video quality Series P.800 Audiovisual quality in multimedia services Series P.900 Transmission performance and QoS aspects of IP end-points Series P.1000 Communications involving vehicles Series P.1100 Mo
5、dels and tools for quality assessment of streamed media Series P.1200 Telemeeting assessment Series P.1300 Statistical analysis, evaluation and reporting guidelines of quality measurements Series P.1400 Methods for objective and subjective assessment of quality of services other than speech and vide
6、o Series P.1500 For further details, please refer to the list of ITU-T Recommendations. Rec. ITU-T P.804 (10/2017) i Recommendation ITU-T P.804 Subjective diagnostic test method for conversational speech quality analysis Summary Recommendation ITU-T P.804 describes a subjective methodology for asses
7、sing and diagnosing the quality of transmitted speech in a telephone conversation. In addition to a score for the overall conversation quality, the methodology yields overall quality scores for three perceivable phases in a telephone conversation: listening, speaking, and interaction, as well as sco
8、res for their corresponding seven perceptual dimensions. Four of the perceptual dimension scores represent degradation associated with the listening phase, two are associated with the speaking phase, and one is associated with the interaction phase. Each of the perceptual dimension scores are based
9、on ratings of the amount of degradation present in one system condition. The method is designed to be used with nave subjects. The dimension scores can be used to provide diagnostic information on the causes of system degradations. The method is meant as a complement to standard conversation tests.
10、History Edition Recommendation Approval Study Group Unique ID* 1.0 ITU-T P.804 2017-10-29 12 11.1002/1000/13397 Keywords Conversational speech quality evaluation, diagnostic evaluation of conversational speech quality, multi-dimensional quality assessment, subjective testing. * To access the Recomme
11、ndation, type the URL http:/handle.itu.int/ in the address field of your web browser, followed by the Recommendations unique ID. For example, http:/handle.itu.int/11.1002/1000/11830-en. ii Rec. ITU-T P.804 (10/2017) FOREWORD The International Telecommunication Union (ITU) is the United Nations speci
12、alized agency in the field of telecommunications, information and communication technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them w
13、ith a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Re
14、commendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for
15、 conciseness to indicate both a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure, e.g., interoperability or applicability) and compliance with the Reco
16、mmendation is achieved when all of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any p
17、arty. INTELLECTUAL PROPERTY RIGHTSITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property
18、Rights, whether asserted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had not received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, impleme
19、nters are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2017 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written
20、permission of ITU. Rec. ITU-T P.804 (10/2017) iii Table of Contents Page 1 Scope . 1 2 References . 1 3 Definitions 1 3.1 Terms defined elsewhere 1 3.2 Terms defined in this Recommendation . 1 4 Abbreviations and acronyms 1 5 Conventions 2 6 Introduction to conversational speech quality analysis 2 7
21、 Test methodology . 3 7.1 Dimension rating scales 3 7.2 Test design 4 7.3 Dimension rating scheme . 5 7.4 Instruction and training . 6 Annex A Test instructions translated from German Analysis of speech quality in a conversational situation 8 Appendix I Examples for the speaking part of the test met
22、hodology translated from German . 13 Appendix II Results from an initial pilot test and second retest . 14 Bibliography. 16 Rec. ITU-T P.804 (10/2017) 1 Recommendation ITU-T P.804 Subjective diagnostic test method for conversational speech quality analysis 1 Scope This Recommendation describes a sub
23、jective test methodology which is able to assess and diagnose the quality of speech in a “telephone conversation“ scenario. Common conversation tests, as described in ITU-T P.800 and ITU-T P.805, provide valid methods for the overall conversational quality, but do not give insights into reasons for
24、possible quality losses. In addition, common conversational tests lack analytic ability, since nave participants concentrate on the conversation flow. To circumvent these problems, this Recommendation describes a test methodology that specifically allows participants to perceive each phase of a conv
25、ersation separately, in addition to a natural conversation, and yields overall conversational quality scores as well as quality scores for each phase (listening, speaking, interaction). In addition, scores for seven underlying perceptual dimensions of conversational speech quality are provided. Thes
26、e scores enable the analysis of conversational speech quality for diagnosis and optimization. 2 References The following ITU-T Recommendations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the
27、editions indicated were valid. All Recommendations and other references are subject to revision; users of this Recommendation are therefore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently
28、valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does not give it, as a stand-alone document, the status of a Recommendation. ITU-T P.800 Recommendation ITU-T P.800 (1996), Methods for subjective determination of transmission quality. ITU-T P
29、.805 Recommendation ITU-T P.805 (2007), Subjective evaluation of conversational quality. ITU-T P.806 Recommendation ITU-T P.806 (2014), A subjective quality test methodology using multiple rating scales. ITU-T P.835 Recommendation ITU-T P.835 (2003), Subjective test methodology for evaluating speech
30、 communication systems that include noise suppression algorithm. 3 Definitions 3.1 Terms defined elsewhere None. 3.2 Terms defined in this Recommendation None. 4 Abbreviations and acronyms This Recommendation uses the following abbreviations and acronyms: ACR Absolute Category Rating 2 Rec. ITU-T P.
31、804 (10/2017) MDS Multidimensional Scaling MOS Mean Opinion Score PCA Principal Component Analysis RNVT Random Number Verification Task SCT Short Conversation Test SD Semantic Differential 5 Conventions None. 6 Introduction to conversational speech quality analysis To provide diagnostic information
32、about the quality of transmitted speech, ITU-T recommends using multiple rating scales in subjective experiments in ITU-T P.806. The approach targets assessing perpetual dimensions to give deeper insight into possible quality loss. The recommended method refers to the passive listening-only situatio
33、n and gives additional information to overall quality mean opinion score (MOS) absolute category rating (ACR)-experiments, as recommended in ITU-T P.800. However, the listening-only situation, only partly agrees with reality in telecommunications. Thus, conversation tests to assess the quality in an
34、 interactive situation have been designed and described in ITU-T P.805. The recommended methods do not provide diagnostic information. For this, a set of seven perceptual dimensions for a conversational situation are proposed in b-Kster2014. The proposed dimensions cover the three possible phases/si
35、tuations of a conversation: listening, speaking, and interaction. See b-Guguin2008. To identify the relevant proposed dimensions, each phase has been analysed in detail, applying a (I) pairwise similarity experiment with a following multidimensional scaling (MDS) and a (II) a sematic differential (S
36、D) experiment with a following principal component analysis (PCA). Applying both methods in separate experiments in b-Kster2014 and b-Wltermann2010 resulted in the following set of perceptual dimensions for a conversational situation (see Table 1): the listening phase is comprised of four dimensions
37、: noisiness, discontinuity, coloration and loudness; the speaking phase is comprised of two dimensions: impact of ones own voice on speaking and degradation of ones own voice; the interaction phase is comprised of only one dimension: interactivity. Rec. ITU-T P.804 (10/2017) 3 Table 1 Overview of th
38、e seven identified and proposed perceptual quality dimensions for a conversational situation Conversational phase Perceptual dimension Description Possible source Listening phase Noisiness Background noise, circuit noise, coding noise Coding, circuit or background noise Discontinuity Isolated and no
39、n-stationary distortions Packet loss Coloration Frequency response distortions Bandwidth limitations Loudness Important for the overall quality and intelligibility Attenuation Speaking phase Impact of ones own voice How is the back-coupling of ones own voice perceived Sidetone and echo Degradation o
40、f ones own voice How is the back-coupling of ones own voice degraded Frequency distortions of the sidetone and echo path Interaction phase Interactivity Delayed and disrupted interaction Delay Since the proposed dimensions were identified in separate listening, speaking, and conversation tests, the
41、dimensions were validated in a sophisticated conversational experiment in which each of the conversational phases, as well as all of the proposed perceptual dimensions were addressed b-Kster2015a. The outcome of the experiment showed that in traditional conversation scenarios the proposed dimensions
42、 are difficult to identify. Thus, this Recommendation describes a test method that specifically allows participants to perceive each phase separately, in addition to a natural conversation paradigm. In addition, the described method allows for directly quantifying the proposed seven dimensions withi
43、n one single experiment. The method enables analysis of conversational speech quality for diagnosis and optimization and enables an increase in the number of conditions to be assessed. 7 Test methodology 7.1 Dimension rating scales The subjective method provides a means for quantifying seven quality
44、 relevant perceptual dimensions in a conversational situation (noisiness, discontinuity, coloration, loudness, impact of ones own voice on speaking, degradation of ones own voice, and interactivity) directly by means of seven descriptive scales. In addition, the overall conversational quality and th
45、e overall quality for each individual phase are gathered. Each dimension scale is dedicated to one particular dimension. The scales are labelled with the antonym-pairs describing the corresponding dimension. This enables for directly quantifying separate scores for each perceptual dimension present
46、in a conversational situation. The overall rating scales and the graphical scale layout for the dimensions are shown in Figure 1. The continuous scales were chosen over traditional ACR scales because they showed to be more sensitive b-Kster2015b. While the labels on the left of the scales describe n
47、o impairment in the relating dimension, the labels on the right describe the maximum impairment. Thus the scales are considered to be unipolar. A detailed 4 Rec. ITU-T P.804 (10/2017) description of usage and definition of the scales, as given to test participants, can be found in Annex A. Overall Q
48、uality Figure 1 Dimension scale design 7.2 Test design The method follows common paradigms for subjective conversational tests as described in ITU-T P.805. For each condition, or transmission system properties under test, two participants in two separate rooms according to ITU-T P.800 are required.
49、The basic test setup can be seen in Figure 2. Figure 2 Test method set-up Rec. ITU-T P.804 (10/2017) 5 The test method specifically allows participants to perceive each phase separately, in addition to a natural conversation paradigm. Therefore, the test method to assess one condition is composed of three sessions: 1 In the first session, the task of the two participants is to conduct a short conversation test (SCT) scenario according to ITU-T P.805. The SCTs were used because their tasks represent everyday-