ITU-T P 805-2007 Subjective evaluation of conversational quality (Study Group 12)《会话质量的主观评估》.pdf

资源描述

1、 International Telecommunication Union ITU-T P.805TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (04/2007) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods for objective and subjective assessment of quality Subjective evaluation of conversational qualit

2、y ITU-T Recommendation P.805 ITU-T P-SERIES RECOMMENDATIONS TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Vocabulary and effects of transmission parameters on customer opinion of transmission quality Series P.10 Subscribers lines and sets Series P.30 P.300 Transmission

3、 standards Series P.40 Objective measuring apparatus Series P.50 P.500 Objective electro-acoustical measurements Series P.60 Measurements related to speech loudness Series P.70 Methods for objective and subjective assessment of quality Series P.80 P.800 Audiovisual quality in multimedia services Ser

4、ies P.900 Transmission performance and QoS aspects of IP end-points Series P.1000 For further details, please refer to the list of ITU-T Recommendations. ITU-T Rec. P.805 (04/2007) i ITU-T Recommendation P.805 Subjective evaluation of conversational quality Summary ITU-T Recommendation P.805 describ

5、es methods and procedures for conducting conversation tests to evaluate communication quality. The methodology uses examples of scenarios, rating scales and analysis procedures to estimate the subjective quality of telecommunication services. Conversation tests allow the simulation of more realistic

6、 situations close to the actual service conditions experienced by telephone customers. In addition, conversation tests are designed to assess the effects of impairments that can cause difficulty while conversing (such as delay, packet loss, echo, interruptions, noise, clipping, etc.), and may be use

7、d to study overall system effects or specific degradations as well. Source ITU-T Recommendation P.805 was approved on 22 April 2007 by ITU-T Study Group 12 (2005-2008) under the ITU-T Recommendation A.8 procedure. ii ITU-T Rec. P.805 (04/2007) FOREWORD The International Telecommunication Union (ITU)

8、 is the United Nations specialized agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardi

9、zing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covere

10、d by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicat

11、e both a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure e.g. interoperability or applicability) and compliance with the Recommendation is achieved wh

12、en all of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPER

13、TY RIGHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted

14、 by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had not received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementers are cautioned that

15、 this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2007 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. ITU-T

16、 Rec. P.805 (04/2007) iii CONTENTS Page 1 Scope 1 2 References. 1 3 Definitions 1 4 Abbreviations and acronyms 1 5 Conventions 2 6 Conversation test process . 2 6.1 Purpose . 2 6.2 Test facilities 2 6.3 Test design 3 6.4 Test conditions 4 6.5 Subjects. 5 6.6 Tasks. 6 6.7 Questions 8 6.8 Data analysi

17、s and report . 10 Appendix I Relationship between MSD and number of votes per condition . 11 Appendix II Data analysis and presentation of results 12 II.1 Calculation and presentation of basic statistical measures. 12 II.2 Thorough analysis 13 Appendix III Example of instructions for the conversatio

18、n test . 14 Appendix IV Example scenarios in German language . 15 Appendix V Example scenarios in English language . 54 Appendix VI Example scenarios in French language. 93 Appendix VII Example of a Richards task. 132 Appendix VIII Example scenarios for random number verification tasks . 133 Appendi

19、x IX Example scenarios for interactive short conversation tests 135 Bibiliography . 138 ITU-T Rec. P.805 (04/2007) 1 ITU-T Recommendation P.805 Subjective evaluation of conversational quality 1 Scope This Recommendation describes the method and procedures for generic conversational testing. The prot

20、ocol described below is aimed to evaluate the effects of degradation such as delay, echo, voice clipping and dropped packets on the quality of voice communications. The methodology described in this Recommendation corresponds to the conversation-opinion tests recommended in ITU-T P.800. Contrary to

21、listening tests, conversation-opinion tests are designed to assess the effects of impairments that may influence the quality of speech while conversing (such as delay). Procedures adapted to specific equipment can be found in ITU-T P.831 for echo cancellers, ITU-T P.832 for hands-free terminals and

22、ITU-T Rec. P.840 for circuit multiplication equipment. 2 References The following ITU-T Recommendations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Reco

23、mmendations and other references are subject to revision; users of this Recommendation are therefore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularl

24、y published. The reference to a document within this Recommendation does not give it, as a stand-alone document, the status of a Recommendation. ITU-T P.800 ITU-T Recommendation P.800 (1996), Methods for subjective determination of transmission quality. ITU-T P.800.1 ITU-T Recommendation P.800.1 (20

25、06), Mean Opinion Scores (MOS) terminology. ITU-T P.831 ITU-T Recommendation P.831 (1998), Subjective performance evaluation of network echo cancellers. ITU-T P.832 ITU-T Recommendation P.832 (2000), Subjective performance evaluation of hands-free terminals. 3 Definitions This Recommendation defines

26、 the following term: 3.1 conversation test: A subjective test in which two participants have a real-time conversation, as described in Annex A to ITU-T P.800 and in b-Telephonometry. 4 Abbreviations and acronyms This Recommendation uses the following abbreviations and acronyms: ANOVA ANalysis Of VAr

27、iance MANOVA Multivariate ANalysis Of VAriance MOS Mean Opinion Score MSD Minimum Significant Difference POTS Plain Old Telephone Service 2 ITU-T Rec. P.805 (04/2007) 5 Conventions None. 6 Conversation test process 6.1 Purpose Conversation-opinion tests allow the subjects involved to be in a more re

28、alistic situation simulating the actual service conditions experienced by telephone customers. In addition, conversation-opinion tests are designed to assess the effects of impairments that can cause difficulty while conversing (such as delay, packet loss, echo, interruptions, noise, clipping, etc.)

29、. They can be used to study overall system effects or specific degradations, such as delay. Subjects participate in the test as paired sets of communicators. They are seated in separate sound-proof rooms and asked to hold a conversation through the transmission chain and then to give their opinion o

30、f the quality on different quality scales. Acoustic noise environments may be simulated in one or both of the rooms. Depending on the purpose of the test, either expert, experienced or untrained (nave) subjects may participate. Such tests can be useful to manufacturers, operators and customers, and

31、are an important assessment tool because they provide the closest simulation of real telephony interactions between subscribers. Untrained (nave) subjects are used when it is important to get an indication of how the general telephone-using population would rate the overall quality and difficulty in

32、 using the connection with the system under test. This can be used to give a “global“ evaluation of the performance in a range of conditions. However, untrained subjects are unable to describe and identify accurately the types of degradation associated with the system under test. The main characteri

33、stics of a conversation-opinion test are: To be very close to a real conversation where people are required to interact and may adapt their behaviour to accommodate the system under test. The use of a task to stimulate a conversation with equal participation of both parties. Different subjects may h

34、ave variable behaviour in a conversation (due to culture, personality, etc.), which could create greater variability in subjects responses in the assessment of speech quality. Since subjects have to concentrate on participating in the conversation, and are not specifically involved in assessing the

35、quality performance during the conversation, their final measures may be less sensitive than in listening-only tests. Conversation tests are the most valid method for measuring the effect on acceptability of certain system impairments, such as delay. Devices under test and simulation tools must be a

36、vailable at the testing lab and must run in real time. This conversation test methodology can be adapted to field testing; however, it is foreseen that the control of some experimental variables (e.g., delay, packet loss, acoustic noise, etc.) would be limited. 6.2 Test facilities A conversational t

37、est has to provide as realistic a communication environment as possible. All processes in the communication link are required to be real time. Switching between conditions that involve different coders and/or different networks parameters must be transparent to the subjects. This may require special

38、ized instrumentation and procedures. ITU-T Rec. P.805 (04/2007) 3 Asymmetry between two subjects in a communication is typical of many actual speech communication scenarios; an asymmetric scenario may be defined by different acoustic noise environments or different transmission conditions. Special c

39、onsideration may be needed to ensure accurate simulation of acoustic noise environments. For example, significant low frequency power is required for the simulation of automobile environments. Typical test facilities are illustrated in Figure 6-1 below. P.805(07)_F6-1Access network Access network Ro

40、om A Room BNetwork simulatoror real networkFigure 6-1 Example of test facilities Each subject sits in a separate sound-proof room where a variety of acoustic noise environments can be simulated. The environment in both rooms can be the same or different. Examples of different environments are quiet

41、room, office, car, railway station, train and cafeteria. A quiet room might be simulated by the introduction of a suitable level of Hoth noise. Certain chambers also allow reverberation to be considered as an experimental variable. A description of sound-proof rooms can be found in ITU-T P.800. In a

42、ddition, the send and receive sensors used by the subjects may be the same or different. For example, handset, headset with microphone or microphone and loudspeaker may be used; the choice of the equipment depends on the use case. 6.3 Test design Most of the test design issues relevant to listening-

43、only tests are also relevant to conversation tests, for example, reference conditions and presentation order effects. A major limitation to conversational test design is the duration of each individual task, or trial, required to exercise each experimental condition. Properly exercising a communicat

44、ion system requires conversations lasting a minimum of 2 minutes. Typical trials require 4 to 5 minutes duration where the conversation period takes 2 to 3 minutes and the response period another 2 minutes. This would limit the total number of conditions in a subjects session to about 24 conditions

45、which would take about 3 hours including instructions, preliminaries and breaks. Tasks designed to measure some system degradations may require conversations longer than 2 to 3 minutes. Compromises have to be made between the test duration and the choice of conditions. If more conditions are to be t

46、ested, the test must be separated into several sessions/experiments and may require different subject panels. 4 ITU-T Rec. P.805 (04/2007) An example is shown below in Table 6-1. Table 6-1 Timetable for a 24 condition test Visit 1 Visit 2 Instruction Session 1 Break Session 2 Session 3 Break Session

47、 4 Number of conversations 7 (incl. practice) 6 6 6 Time 15 mins 35 mins 10 mins 30 mins 30 mins 10 mins 30 mins Conditions that are identical in both directions and that use the same sensors and same acoustic noise are called symmetric conditions. Any other case is considered asymmetric. For asymme

48、tric conditions, subject pairs should be required to swap location for each condition. This limits the total number to 12 asymmetric conditions. It is recognized that there is a trade-off between the test resolution and the number of votes per condition. The relationship between these two parameters

49、 is given by the following general equation for minimum significant difference (MSD): nEMSqCMSDpdf/,= where: Cdf, pis a t-like value determined by the particular statistical test, the probability level (p), and the degrees of freedom (df) EMSq is the error mean square derived from the ANOVA n is the number of votes per condition Some further information is given in Appendix I. In order to achieve a sufficient resolution between conditions, it is recommended that the minimum number of subject pairs should in general be 16, but i

展开阅读全文