1、INTERNATIONAL TELECOMM U N KATION UN ION ITU=T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU P.920 (05/2000) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Audiovisual quality in multimedia services Interactive test methods for audiovisual communications ITU
2、-T Recommendation P.920 (Formerly CCITT Recommendation) STD-ITU-T RECMN P.920-ENtL 2000 W VBb259L Ob8b4V2 179 ITU-T P-SERIES RECOMMENDATIONS TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Vocabulary and effects of transmission parameters on customer opinion of transmiss
3、ion qualis Subscribers lines and sets Transmission standards Objective measuring apparatus Objective electro-acoustical measurements Measurements related to speech loudness Methods for objective and subjective assessment of quality Audiovisual quality in multimedia services Series P.10 Series P.30 S
4、eries P.40 Series P.50 Series P.60 Series P.70 Series P.80 P.800 Series P.900 P.300 P.500 For further details, please refer to the list of ITU-T Recommendations. STD.ITU-T RECMN P*920-ENGL 2000 Li8b259L ObBbLiLi3 O05 H ITU-T Recommendation P.920 Interactive test methods for audiovisual communication
5、s Summary This IT-T Recommendation is intended to define interactive evaluation methods for quantiwng the impact of terminal and communication link performance on point-to-point or multipoint audiovisual communications. This methodology is based upon conversation opinion tests, and can be considered
6、 to be an extension of the methods defined in Annex AP.800. Source IT-T Recommendation P.920 was prepared by ITU-T Study Group 12 (1997-2000) and approved under the WTSC Resolution 1 procedure on 18 May 2000. ITU-T P.920 (05/2000) 1 FOREWORD The International Telecommunication Union (ITU) is the Uni
7、ted Nations specialized agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing teleco
8、mmunications on a worldwide basis. The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the
9、procedure laid down in WTSC Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with IS0 and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a
10、telecommunication administration and a recognized operating agency. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the eviden
11、ce, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had not received notice of intellectual property, protected by patents, which may
12、 be required to implement this Recommendation. However, implementors are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database. O ITU 2001 All rights reserved. No part of this publication may be reproduced or utilized in any
13、form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from ITU. ITU-T P.920 (05/2000) 11 1 2 3 3.1 3.2 3.3 3 -4 3.5 3.6 4 CONTENTS Page Scope . References . Experimental design . Basic approach and factors to be investigated Stimuli used
14、in conversational tests Test conditions and experimental design Subjects . Subject training and reference connections Ambient room and equipment characteristics . Solicitation of opinions . Appendix I . Examples of tasks and stimuli for conversation . I . 1 Stimuli for conversation 1.2 Tasks to eval
15、uate the effects of speech delay on communication quality . 1.3 Tasks to evaluate the effects of audiovisual delay andor transmission errors on communication quality 1.4 Task to evaluate the synchronization between audio and video signals . Appendix II - Protocols for the stimuli for conversation .
16、II.1 Protocol for the Name-Guessing task . II.2 Protocol for the Story-Comparison task . II.3 Protocol for the Picture-Comparison task . II.4 11.5 Protocol for the building blocks task Appendix III - Test condition questionnaire Protocol for the object-description task Appendix IV - Exit questions A
17、ppendix V - Bibliography . 1 1 6 6 9 10 11 ITU-T P.920 (05/2000) . 111 Introduction The audiovisual interactive test methods described in this ITU-T Recommendaton are intended for quantiMng the impact of terminal and communication link performance, that may affect the ability to conduct an interacti
18、ve audiovisual communication. The efficacy of these tests strongly depends on the ability to reproduce in laboratory environments the conditions that are very close to the real situations. In this regard, particular care must be taken in choosing the tasks proposed to the subjects. In general, those
19、 tasks used in conversation tests for telephony assessment are not suited for audiovisual assessment because they often distract the subjects attention from the video screen. Therefore new tasks have been developed following the criteria illustrated in this ITU-T Recommendation. Substantial work has
20、 been done in this area, although all aspects of audiovisual quality are not yet completely understood. This -T Recommendation reflects the current status of research on interactive audiovisual testing. As progress on this work continues, understanding of these interactive test methods will no doubt
21、 improve. As new knowledge is attained, this ITU-T Recommendation will be revised. iv ITU-T P.920 (05/2000) STDIITU-T RECMN P.920-ENGL 2000 q8b259L Ob8b1i47 750 M ITU-T Recommendation P.920 Interactive test methods for audiovisual communications 1 Scope This IT-T Recommendation is intended to define
22、 interactive evaluation methods for quantifjmg the impact of coding artifacts, transmission delay and transmission impairments (e.g. packet loss, cell loss, digital channel errors) on point-to-point or multipoint audiovisual communications. This methodology is based upon conversation opinion tests,
23、and can be considered to be an extension of methods defined in Annex AR.800. This ITU-T Recommendation does not cover topics that are already described in other P-series ITU-T Recommendations such as the objective measurements of the link. 2 References The following ITU-T Recommendations and other r
24、eferences contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; all users of this Recommendation are therefore encouraged
25、 to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. - ITU-T Recommendation P.800 (1 996), Methods for subjective determination of transmission quali?.
26、 - ITU-R Recommendation BT.812 (1992), Subjective assessment of the quality of alphanumeric and graphic pictures in Teletext and similar services. ITU-T Recommendation P.9 1 1 (1 998), Subjective audiovisual quality assessment methods for multimedia applications. LEC Publication 6065 1 (1979), Sound
27、 level meters. - - 3 Experimental design 3.1 In order to quanti the impact of factors, such as transmission delay, that may affect the ability to conduct an interactive communication, the approach proposed in this ITU-T Recommendation is based on an active talker conversation assessment. Further, si
28、nce it is necessary to express these opinions using a rating system, several single-stimulus rating scales are proposed. Basic approach and factors to be investigated 3.2 In general, in conversational opinion tests it is desired to minimize the artificiality of the environment. However, at the same
29、time, it is necessary to invoke some method to stimulate interactive communication utilizing the conditions which are being evaluated. In telephony assessments, it is common to use a set of photographs, or some other form of printed material, to achieve this objective. In audiovisual terminal perfor
30、mance assessments, however, such mechanisms are likely to distract a participants attention from the video screen, thus possibly leading to an unnatural mode of communication for this type of terminal. Stimuli used in conversational tests ITU-T P.920 (05/2000) 1 For general applications, the followi
31、ng guidelines are provided for designing task-based tests: - the task should be designed such that, during their conversation, the subjects primarily maintain their attention on the audiovisual terminal; the task must have sufficient face value, that is, it must resemble real-life audiovisual commun
32、ication to a sufficient degree. In particular, it is preferable that the task be performed by two subjects and not by one subject and an experimental leader; the task must yield reproducible quantitative results that represent adequate measures of communication efficiency. When time delays are invol
33、ved, time measures should be among the results. A wide range of subjects, including elderly and hearing-impaired subjects, should be able to perform the task. It is preferable that the task is, in itself, sufficiently rewarding for the subjects. This has several advantages: the subjects learn the ta
34、sk faster and they are less susceptible to fatigue and loss of motivation. From past experiments, it has been found that lively audiovisual conversations can be stimulated if the participants in such a test know each other. Subsequently, the provision of written material can be used as a secondary,
35、rather than primary, source of stimulation. Thus, unlike telephony, familiarity between pairs of conversing participants is highly desirable, if not essential. It is recognized, however, that for specific applications, the conversational tasks may have to be modified to take into account the service
36、s that the system under test is intended to provide. In order to permit meaningful measurements to be made of the factors being investigated, it is recommended that in such cases the conversational tasks be structured so as to represent the applications of interest, particularly as regards: a) b) Fo
37、r example, to account for the attributes in the first category, tasks could range from predominately one-way communication, to free-conversation, to a rapid exchange of information, be it via video, audio or both signals. Similarly, to test attributes in the second category, tasks could range from t
38、he subjects working on a hard-copy document in front of them (minimum use of video information) to reading sign language over the video link (maximum use of video information). The actual tasks should combine attributes from both categories. These guidelines have been applied to develop the tasks il
39、lustrated in Appendix I, and the protocols for the tasks are detailed in Appendix II. - - the rate of information exchange; and the degree of audio and video signal utilization. 3.3 In general, at least one transmission impairment factor or test condition is likely to be evaluated in a test, in addi
40、tion to a baseline (reference) condition where the impact of such factor is minimum (when using the reference condition, this should not be identified as such to the participants). However, because conversational tests are time-consuming, the total number of conditions ought to be reasonably constra
41、ined in order to minimize participant fatigue and maximize experimental accuracy. This requirement should be balanced against the need to ensure that the duration of each conversatiodcondition is at least five minutes long. As with conversational tests (audio communications), a Latin or Greco-Latin
42、square may be found to be a suitable experimental design for this purpose. In such case, the squares rows may be associated with the test participants and the squares columns with the order in which the conditions in the test are being presented. Other treatments may also be appropriate depending on
43、 the factors being investigated. For example, past experiments have appeared to indicate that there may be an interaction between the audiovisual communication path quality and the perception of the impact of transmission delay. Consequently, it Test conditions and experimental design 2 ITU-T P.920
44、(05/2000) may be preferable to apply two treatments using a Latin-Latin square design, so that the letters of the first alphabet are associated with different values of transmission delay and the letters of the second alphabet are associated with different imagehoice coding rates. Of course, other e
45、xperimental designs including replicated block designs and Youden square designs may be suitable and could be left up to the experimenter to select in order to meet specific cost and accuracy objectives in view of the number of conditions of interest. Also any possible effects related to the order i
46、n which the tasks are performed must be taken into account. 3.4 Subjects At least 16 subjects should participate in a test, the exact number will be dictated by the experimental design and the accuracy required to the results. These subjects should be non-expert, and they should not be directly invo
47、lved with either audio and/or video technology as part of their normal work. Nevertheless, in the early phases of the development of audiovisual communications systems and in pilot experiments carried out before a larger test, small groups of experts (4-8) or other critical subjects can provide indi
48、cative results with sufficient reliability. 3.5 Before starting the experiment, a scenario of the intended application of the system under test should be given to the subjects. The range and type of impairments should be shown in a preliminary phase. During this phase, a first level of personal intr
49、oduction may thus be allowed to take place over the communication link at the worst (or best) experimental condition, while further discussion pertinent to the tasks expected of the participants can be subsequently permitted at the best (or worst) experimental condition. Again, as with the main test, the particulars of the conditions should not be revealed to the test participants. Subject training and reference connections 3.6 Table 1 lists typical viewing and listening conditions as used in audiovisual quality assessment. The actual parameter settings used in the asse