1、 Recommendation ITU-R BS.1116-3 (02/2015) Methods for the subjective assessment of small impairments in audio systems BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1116-3 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economical use o
2、f the radio-frequency spectrum by all radiocommunication services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by World and R
3、egional Radiocommunication Conferences and Radiocommunication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Annex 1 of Resolution ITU-R 1. Forms to be used for the s
4、ubmission of patent statements and licensing declarations by patent holders are available from http:/www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series of ITU-
5、R Recommendations (Also available online at http:/www.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermination, a
6、mateur and related satellite services P Radiowave propagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satellite news
7、 gathering TF Time signals and frequency standards emissions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1. Electronic Publication Geneva, 2015 ITU 2015 All rights reserved. No part of this publication may
8、 be reproduced, by any means whatsoever, without written permission of ITU. Rec. ITU-R BS.1116-3 1 RECOMMENDATION ITU-R BS.1116-3* Methods for the subjective assessment of small impairments in audio systems (Question ITU-R 62/6) (1994-1997-2015) Scope This Recommendation is intended for use in the a
9、ssessment of systems which introduce impairments so small as to be undetectable without rigorous control of the experimental conditions and appropriate statistical analysis. If used for systems that introduce relatively large and easily detectable impairments, it leads to excessive expenditure of ti
10、me and effort and may also lead to less reliable results than a simpler test. This Recommendation forms the base reference for the other Recommendations, which may contain additional special conditions or relaxations of the requirements included in this Recommendation. Keywords Audio quality; small
11、impairments; subjective assessment; listening test; audio coding; high-quality audio; listening room The ITU Radiocommunication Assembly, considering a) that Recommendations ITU-R BT.500, ITU-R BS.562, ITU-R BT.710 and ITU-R BT.811 have established some methods for assessing subjective quality of au
12、dio and video systems; b) that subjective listening tests permit assessment of the degree of annoyance caused to the listener by any impairment of the wanted signal during its transmission between the originating source and the listener; c) that classical objective methods may not be adequate in ass
13、essing advanced audio coding schemes and that perceptual objective assessment methods are being developed for testing the sound quality of sound systems; d) that the use of standardized methods is important for the exchange, compatibility and correct evaluation of the test data; e) that the introduc
14、tion of new advanced digital audio systems exploiting psycho-acoustic properties, especially with small impairments requires advancements in subjective assessment methods; f) that the introduction of multichannel stereophonic sound systems up to 3/2 channels specified in Recommendation ITU-R BS.775
15、and the advanced sound system described in Recommendation ITU-R BS.2051, with or without accompanying picture requires new subjective assessment methods, including the experimental conditions, recommends 1 that the testing, evaluation and reporting procedures given in Annex 1 be used for the subject
16、ive assessment of small impairments in audio systems including multichannel sound systems (with or without picture), * This Recommendation should be brought to the attention of the International Organization for Standardization/Moving Picture Experts Group (ISO/MPEG) Audio ad hoc Group. 2 Rec. ITU-R
17、 BS.1116-3 further recommends 1 that further studies of the characteristics of listening rooms and reproduction devices for the advanced sound system are needed and this Recommendation should be updated when those studies are completed. Annex 1 1 General 1.1 Contents Annex 1 is divided into 11 secti
18、ons, giving detailed requirements for various aspects of the tests: 1. General 2. Experimental design 3. Selection of listening panels 4. Test method 5. Attributes 6. Programme material 7. Reproduction devices 8. Listening conditions 9. Statistical analysis 10. Presentation of the results of the sta
19、tistical analyses 11. Contents of test reports. Also included are Attachments containing guidance on the selection of expert listeners and an example of the instructions given to the test subjects. A number of common words are used with technical meanings. A Glossary of these is given in Attachment
20、4. 2 Experimental design Many different kinds of research strategies are used in gathering reliable information in a domain of scientific interest. In subjective assessment of small impairments in audio systems, the most formal experimental methods shall be used. Subjective experiments are character
21、ized firstly by actual control and manipulation of the experimental conditions, and secondly by quantitative data from human observers. Careful experimental design and planning is needed to ensure that uncontrolled factors do not contaminate the listening test so that ambiguities are not caused. As
22、an example, if the actual sequence of audio items is identical for all the subjects in a listening test, then one could not be sure whether the judgements made by the subjects were due to that sequence rather than to the different levels of impairments that were presented. Accordingly, the test cond
23、itions must be arranged in a way that reveals the effects of the independent factors, and only of these factors. Rec. ITU-R BS.1116-3 3 In situations where it can be expected that the potential impairments and other characteristics will be distributed homogeneously throughout the listening test, a t
24、rue randomization can be applied to the presentation of the test conditions. Where non-homogeneity is expected this must be taken into account in the presentation of the test conditions. For example, where material to be assessed varies in level of difficulty, the order of presentation of stimuli mu
25、st be distributed randomly, both within and between sessions. Similarly, listening tests need to be designed so that subjects are not overloaded to the point of lessened accuracy of judgement. Except in cases where the relationship between sound and vision is important, it is preferred that the asse
26、ssment of audio systems is carried out without accompanying pictures. A major consideration is the inclusion of appropriate control conditions. Typically, control conditions include the presentation of unimpaired audio materials, introduced in ways that are unpredictable to the subjects. It is the d
27、ifferences between judgement of these control stimuli and the potentially impaired ones that allows one to conclude that the grades are actual assessments of the impairments. Some of these considerations will be discussed later in this document. It should be understood that the topics of experimenta
28、l design, experimental execution, and statistical analysis are complex, and that only the most general guidelines can be given in a Recommendation such as this. It is recommended that professionals with expertise in experimental design and statistics should be consulted or brought in at the beginnin
29、g of the planning for the listening test. 3 Selection of listening panels 3.1 Expert listeners It is important that data from listening tests assessing small impairments in audio systems should come exclusively from subjects who have expertise in detecting these small impairments. The higher the qua
30、lity reached by the systems to be tested, the more important it is to have expert listeners. 3.2 Criteria for selecting subjects The outcome of subjective tests of sound systems with small impairments utilizing a selected group of listeners is not primarily intended for extrapolation to the general
31、public. Normally the aim is to investigate whether a group of expert listeners, under certain conditions, are able to perceive relatively subtle degradations but also to produce a quantitative estimate of the introduced impairments. The demanding nature of the test procedure is intended to reveal th
32、ose problems that may be revealed during the extensive period of exposure under different conditions which occur in real life once a system has been introduced to the consumer. There is sometimes a reason for introducing a rejection technique either before (pre-screening) or after (post-screening) t
33、he real test. In some cases both types of rejection might be used. Here, elimination is referred to as a process where all judgements from a particular subject are omitted. Any type of rejection technique which is not carefully analysed and applied may lead to a biased result. It is therefore extrem
34、ely important that, whenever elimination of data has been made, the test report clearly describes the applied criterion so that the reader can make his own judgement. 4 Rec. ITU-R BS.1116-3 3.2.1 Pre-screening of subjects Pre-screening procedures, include methods such as audiometric tests, selection
35、 of subjects based on their previous experience and performance in previous tests and elimination of subjects based on a statistical analysis of pre-tests. The training procedure might be used as a tool for pre-screening. The major argument for introducing a pre-screening technique is to increase th
36、e efficiency of the listening test. This must however be balanced against the risk of limiting the relevance of the result too much. 3.2.2 Post-screening of subjects Post-screening methods can be roughly separated into at least two classes; one is based on inconsistencies compared with the mean resu
37、lt and another relies on the ability of the subject to make correct identifications. The first class is never justifiable. Whenever a subjective listening test is performed with the test method recommended here, the required information for the second class of post-screening is automatically availab
38、le. A suggested statistical method for doing this is described in Attachment 1. The methods are primarily used to eliminate subjects who cannot make the appropriate discriminations. The application of a post-screening method may clarify the tendencies in a test result. However, bearing in mind the v
39、ariability of subjects sensitivities to different artefacts, caution should be exercised. 3.3 Size of listening panel The adequate size for a listening panel can be predicted if the variance can be estimated and the required resolution of the experiment is known. Where the conditions of a listening
40、test are tightly controlled on both the technical and behavioural side, experience has shown that data from 20 subjects is often sufficient for drawing appropriate conclusions from the test. If analysis of the data can be carried out as the test proceeds, then no further subjects need be processed w
41、hen an adequate level of statistical significance for drawing appropriate conclusions from the test has been reached. If some of the systems under test are expected to be nearly transparent, a larger number of subjects will be required to ensure that a sufficiently large number pass the post-screeni
42、ng test. If, for any reason, tight experimental control cannot be achieved, then larger numbers of subjects might be needed to attain the required resolution. The size of a listening panel is not solely a consideration of the desired resolution. The result from the type of experiment dealt with in t
43、his Recommendation is, in principle, only valid for precisely that group of expert listeners actually involved in the test. Thus, by increasing the size of the listening panel, the result can be claimed to hold for a more general group of expert listeners and may therefore sometimes be considered mo
44、re convincing. The size of the listening panel may also need to be increased to allow for the probability that subjects vary in their sensitivity to different artefacts. Rec. ITU-R BS.1116-3 5 4 Test method To conduct subjective assessments in the case of systems generating small impairments, it is
45、necessary to select an appropriate method. The “double-blind triple-stimulus with hidden reference” method has been found to be especially sensitive, stable and to permit accurate detection of small impairments. Therefore, it should be used for this kind of test. In the preferred and most sensitive
46、form of this method, one subject at a time is involved and the selection of one of three stimuli (“A”, “B”, “C”) is at the discretion of this subject. The known reference is always available as stimulus “A”. The hidden reference and the object are simultaneously available but are “randomly” assigned
47、 to “B” and “C”, depending on the trial. The subject is asked to assess the impairments on “B” compared to “A”, and “C” compared to “A”, according to the continuous five-grade impairment scale. One of the stimuli, “B” or “C”, should be indiscernible from stimulus “A”; the other one may reveal impair
48、ments. Any perceived differences between the reference and the other stimuli must be interpreted as an impairment. As soon as the subject, in the preferred method, has completed the grading of a trial, it should be possible to proceed directly on to the next trial. The excerpt may be repeated until
49、the subject has made an assessment. In this way the test procedure is self pacing. The grading scale shall be treated as continuous with “anchors” derived from the ITU-R five-grade impairment scale given in Recommendation ITU-R BS.1284 and in Table 1. TABLE 1 Impairment Grade Imperceptible 5.0 Perceptible, but not annoying 4.0 Slightly annoying 3.0 Annoying 2.0 Very annoying 1.0 NOTE 1 It has been shown that the use of pre-defined intermediate anchor points may introduce bias Poulton, 1992. It is possible to use the number scales without descriptions of anchor points. In such cases,