1、 Rec. ITU-R BS.1284-1 1 RECOMMENDATION ITU-R BS.1284-1*General methods for the subjective assessment of sound quality (Question ITU-R 55/6) (1997-2003) The ITU Radiocommunication Assembly, considering a) that the introduction of new kinds of sound signal processing, such as digital coding and bit ra
2、te reduction, new kinds of television signals using time multiplexed components and new services such as enhanced television and high definition television (HDTV), may require new or amended methods of subjective sound quality assessment; b) that these techniques entail their own specific signal imp
3、airments; c) that subjective listening tests permit assessment of the degree of annoyance caused to the listener by any impairment of the wanted signal during its transmission between the source and the listener; d) that many different methods of subjective testing are possible; e) that it is highly
4、 desirable to standardise the methods of subjective testing and the interpretation of the results, so that the best possible comparisons may be made between results obtained at different times and/or in different places; f) that it is highly desirable that the grading scales which are used to descri
5、be the subjective quality of sound should permit more consistent statistical processing methods, independent from the language used to express the opinions; g) that it would be desirable for a single assessment scale to be available for both sound and television programmes; h) that the geometric and
6、 acoustic properties of control rooms and listening rooms can have a considerable influence on audition, and therefore listening conditions should be closely specified, recommends 1 that the testing and evaluation procedures given in Annex 1 to this Recommendation should be used for the subjective a
7、ssessment of the quality of reproduced sound. *Radiocommunication Study Group 6 made editorial amendments to this Recommendation in 2002 in accordance with Resolution ITU-R 44. 2 Rec. ITU-R BS.1284-1 Annex 1 1 General This Annex is divided into the following sections, giving detailed requirements fo
8、r various aspects of the tests: 1 General 2 Experimental design 3 Selection of the listening panel 4 Test method 5 Attributes 6 Programme material 7 Reproduction devices 8 Listening conditions 9 Statistical treatment of data 10 Presentation of results 11 Contents of test reports References This Reco
9、mmendation is intended as a guide to the general assessment of sound quality. It is based on Recommendation ITU-R BS.1116 Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems. However, the requirements of Recommen-dation ITU-R BS.1116 are s
10、tringent, being intended for the assessment of small impairments. More general assessments usually involve larger differences and therefore do not usually need such close control of the test parameters. Recommendation ITU-R BS.1116 contains a glossary of terms, some of which are used in the present
11、Recommendation. Other ITU Recommendations which may be relevant in some special cases, are referred to in Recommendation ITU-R BS.1283 A guide to ITU-R Recommendations for subjective assessment of sound quality. 2 Experimental design In designing the tests, the considerations of Recommendation ITU-R
12、 BS.1116, 2 should be taken into account. However, because the impairments being tested may not be small, it is not always essential to use a reference. If a reference is used, it need not be unimpaired in an absolute sense. In general, statistical expertise will be required to design the test. This
13、 would include the determination of the number of observations needed, the statistical methods for analysing the data and the correct interpretation of the outcomes of the statistical analysis, including a check of the validity of the model assumptions. Rec. ITU-R BS.1284-1 3 3 Selection of the list
14、ening panel Expert listeners are always preferred to non-expert listeners. It has been argued that non-experts may be representative of the general population, and that experts may be excessively critical. However, with long-term exposure to artefacts, in time some non-experts become experts. Theref
15、ore, tests using experts give a better and quicker indication of the likely results in the long term. In cases of doubt, the relationship between expert and non-expert opinion should be investigated. The minimum number of expert listeners should normally be ten, whilst the minimum number of non-expe
16、rt listeners should normally be twenty. Whenever the system is intended for high-quality sound broadcasting or reproduction, expert listeners should be used. Generally, the listeners should undertake training to familiarise themselves with the test procedure, the test materials and the test environm
17、ent. 4 Test method 4.1 Grading scales The following five-grade scales should be used for the subjective assessment of sound quality or impairment. The nature and purpose of the tests will determine which of the two scales is the more appropriate. For comparison tests, either a method based on the fo
18、llowing seven-grade comparison scale or one based on numerical differences using the above five-grade scales may be used. In general, these are not equivalent and may not give the same results. It is essential that the intended direction of the comparison be clearly indicated. Quality Impairment 5 E
19、xcellent 5 Imperceptible 4 Good 4 Perceptible, but not annoying 3 Fair 3 Slightly annoying 2 Poor 2 Annoying 1 Bad 1 Very annoying Comparison 3 Much better 2 Better 1 Slightly better 0 The same 1 Slightly worse 2 Worse 3 Much worse 4 Rec. ITU-R BS.1284-1 NOTE 1 The scales should be treated as contin
20、uous, with a recommended resolution of 1 decimal place. NOTE 2 It has been shown that the use of pre-defined intermediate anchor points may introduce bias. It is possible to use the number scales without descriptions of anchor points. In such cases, the intended orientation of the scales must be ind
21、icated. This may help to overcome translation problems when comparing the results of tests written in different languages. If intermediate anchor points are not used it is essential that the results for individual subjects are normalised with respect to mean and standard deviation. Equation (1) may
22、be used to achieve such normalisation whilst retaining the original scale: sssisiiixssxxZ += .)(1) where: Zi: normalised result xi: score of subject i xsi: mean score for subject i in session s xs: mean score of all subjects in session s ss: the standard deviation for all subjects in session s ssi:
23、the standard deviation for subject i in session s. 4.2 Test procedure Tests may be of single presentations, paired comparisons (one of which may be the reference) or multiple comparisons, with or without references. The presentations may be repeated as required. These test procedures should be used
24、in conjunction with the grading scales of 4.1. For tests of paired comparisons with references involving the using of the five-grade quality or impairment scales, repetition, four times consecutively, of the same programme sequence in the following order can be used: reference sequence; same sequenc
25、e, impaired; reference sequence (repeated); same sequence, impaired (repeated). Short-term human memory limitations may dictate that each programme excerpt should not last longer than 15 to 20 s; they may be very short (a few seconds) for some tests. In the case where the sequence is a musical item,
26、 the phrase should not appear to be interrupted. The interval between presentation 1 and 2 and between 3 and 4 should be about 0.5 to 1 s, while the interval between 2 and 3 should be somewhat longer, for example 1.5 s. The exact time should depend upon the type of programme. When the test sequence
27、is not under the control of the subject, it is necessary to provide a clear indication of the current presentation. The programme sequences and impairments should be presented in random order subject to the condition that the same sequence should never be presented on two successive occasions with t
28、he same or different levels of impairment. Rec. ITU-R BS.1284-1 5 For tests of paired comparisons involving two impaired conditions with the seven-grade comparison scale, a set of presentations in the following order can be used: condition 1, condition 2, condition 1 (repeated), condition 2 (repeate
29、d). Conditions 1 and 2 should be interchanged on a random basis. In addition, a reference condition may be presented at the beginning of each four presentations and, in this case, a definite indication (such as the use of a light signal) should be given that this item is the reference condition. No
30、session with any one listener should last for more than 15 to 20 min without interruption. If the sessions must be consecutive, they should be separated by rest periods of at least the same length. The switching device should not introduce any audible disturbance. In cases where the listeners carry
31、out the tests individually, it is highly desirable that the listeners control the switching between the stimuli as described in Recommendation ITU-R BS.1116. 5 Attributes Depending on the objectives of the test, different numbers and types of attributes may be used to describe the perceived quality.
32、 Any attributes used must be clearly defined. 5.1 Basic audio quality The attribute basic audio quality includes all aspects of the sound quality being assessed. It includes, but is not restricted to, such things as timbre, transparency, stereophonic imaging, spatial presentation, reverberance, echo
33、es, harmonic distortions, quantisation noise, pops, clicks and background noise. For the assessment of small impairments, the attribute basic audio quality is defined differently in Recommendation ITU-R BS.1116. 5.2 Attributes specifying the quality of two-channel stereophonic and multichannel sound
34、 in detail 5.2.1 Two-channel stereophonic image quality The attribute stereophonic image quality is related to differences between the reference and the object in terms of sound image locations and sensations of depth and reality of the audio event. 5.2.2 Multichannel stereophonic image quality The
35、attribute front image quality is related to the localisation of the frontal sound sources. It includes stereophonic image quality and losses of definition. The attribute impression of surround quality is related to spatial impression, ambience, or special directional surround effects. 6 Rec. ITU-R B
36、S.1284-1 5.3 Attributes specifying the relationships between sound and accompanying picture The attribute correlation between sound and accompanying picture may include the following characteristics: correlation of source positions derived from visual and audible cues (including azimuth, elevation a
37、nd depth); correlation of spatial impressions between sound and picture; time relationship between audio and video. 5.4 Main attributes for the absolute assessment of sound quality in detail A list of attributes is given in Appendix 1 to Annex 1 EBU, 1997. 5.5 Attributes specifying quality of digita
38、l transmitted/coded sound in detail A list of main attributes is given in Appendix 2 to Annex 1. 6 Programme material Depending on the precise objective of the tests, and in particular on the category of the sound programme transmission or reproduction system being tested, the test material may be c
39、hosen deliberately for its highly critical behaviour with respect to the impairments introduced by the system being tested. In other cases, less critical material may be used. Recommendation ITU-R BS.1116, 6 contains a detailed presentation of the factors related to critical test programme material
40、and its selection for different purposes. Whenever the system is intended to carry high quality sound, the critical type of material should be used. To ensure the comparability of test data obtained in different places and/or at different times, the same programme sequences should be used. In any ev
41、ent, the content of a programme sequence should be neither so interesting nor so disagreeable or boring that the listener is distracted. 7 Reproduction devices 7.1 Tests which do not include the loudspeakers (or headphones) as part of the system under test The requirements of Recommendation ITU-R BS
42、.1116, 7 should be followed. It should be noted, however, that the use of “A” weighted sound pressure level measurements with a wideband signal does not necessarily give an accurate assessment of subjective loudness. This is especially true if the reproduction system includes some components with di
43、fferent bandwidths. It may be necessary to use alternative methods to ensure the correct gain settings for all reproduction channels. Loudspeakers or headphones should be chosen with the aim that all sound-programme signals or other test signals can be reproduced in an optimum way; namely, they shou
44、ld provide neutral sound Rec. ITU-R BS.1284-1 7 for any type of reproduction and should be usable for monophonic assessment as well as for two or more channel stereophonic sound systems. Certain quality shortcomings are more clearly perceptible in the case of headphone reproduction, however other qu
45、ality shortcomings are more clearly perceptible in the case of loudspeaker reproduction. Therefore it would be necessary to determine the appropriate kind of reproduction device by subjective pre-tests. Especially in cases when shortcomings will affect the characteristics of the stereophonic sound i
46、mage, loudspeaker reproduction should be used. For assessing two-channel stereophonic sound systems, use of both stereo loudspeakers and headphones may be necessary. For assessing monophonic sound systems, one central loudspeaker and/or headphones may be used. Choice of either loudspeakers or headph
47、ones, for individual trials or groups of trials, will enable the audibility of an effect to be correlated with the transducer in use, but the effective number of subjects will be reduced. Alternatively, if the subjects are able to switch at will between loudspeakers and headphones it will not be pos
48、sible to correlate the audibility of an effect with the transducer in use. In the case of making the assessments as far as possible comparable with one another, headphones may be used. Because headphone reproduction is independent of the geometric and acoustic properties of listening and control roo
49、ms, it can, in principle, be defined with great accuracy and can easily be reproduced without systematic error. This does not apply to loudspeaker reproduction. In addition, in the case of headphone reproduction, assessment tests can be carried out with a great number of listeners at the same time and under identical listening conditions. For assessing multichannel sound systems with or without accompanying pictures, loudspeakers must be used if influences on all reproduction channels played simultaneously are to be assessed. In all cases, each lo