1、 Recommendation ITU-R BS.1534-3 (10/2015) Method for the subjective assessment of intermediate quality level of audio systems BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1534-3 Foreword The role of the Radiocommunication Sector is to ensure the rational, equitable, efficient and economic
2、al use of the radio-frequency spectrum by all radiocommunication services, including satellite services, and carry out studies without limit of frequency range on the basis of which Recommendations are adopted. The regulatory and policy functions of the Radiocommunication Sector are performed by Wor
3、ld and Regional Radiocommunication Conferences and Radiocommunication Assemblies supported by Study Groups. Policy on Intellectual Property Right (IPR) ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Annex 1 of Resolution ITU-R 1. Forms to be used f
4、or the submission of patent statements and licensing declarations by patent holders are available from http:/www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC and the ITU-R patent information database can also be found. Series
5、 of ITU-R Recommendations (Also available online at http:/www.itu.int/publ/R-REC/en) Series Title BO Satellite delivery BR Recording for production, archival and play-out; film for television BS Broadcasting service (sound) BT Broadcasting service (television) F Fixed service M Mobile, radiodetermin
6、ation, amateur and related satellite services P Radiowave propagation RA Radio astronomy RS Remote sensing systems S Fixed-satellite service SA Space applications and meteorology SF Frequency sharing and coordination between fixed-satellite and fixed service systems SM Spectrum management SNG Satell
7、ite news gathering TF Time signals and frequency standards emissions V Vocabulary and related subjects Note: This ITU-R Recommendation was approved in English under the procedure detailed in Resolution ITU-R 1. Electronic Publication Geneva, 2015 ITU 2015 All rights reserved. No part of this publica
8、tion may be reproduced, by any means whatsoever, without written permission of ITU. Rec. ITU-R BS.1534-3 1 RECOMMENDATION ITU-R BS.1534-3 Method for the subjective assessment of intermediate quality level of audio systems (Question ITU-R 62/6) (2001-2003-2014-2015) Scope This Recommendation describe
9、s a method for the subjective assessment of intermediate audio quality. This method mirrors many aspects of Recommendation ITU-R BS.1116 and uses the same grading scale as is used for the evaluation of picture quality (i.e. Recommendation ITU-R BT.500). The method, called “MUlti Stimulus test with H
10、idden Reference and Anchor (MUSHA)”, has been successfully tested. These tests have demonstrated that the MUSHRA method is suitable for evaluation of intermediate audio quality and gives accurate and reliable results. Keywords Listening test, artifacts, intermediate audio quality, audio coding, subj
11、ective assessment, audio quality The ITU Radiocommunication Assembly, considering a) that Recommendations ITU-R BS.1116, ITU-R BS.1284, ITU-R BT.500, ITU-R BT.710 and ITU-R BT.811 as well as Recommendations ITU-T P.800, ITU-T P.810 and ITU-T P.830, have established methods for assessing subjective q
12、uality of audio, video and speech systems; b) that new kinds of delivery services such as streaming audio on the Internet or solid state players, digital satellite services, digital short and medium wave systems or mobile multimedia applications may operate at intermediate audio quality; c) that Rec
13、ommendation ITU-R BS.1116 is intended for the assessment of small impairments and is not suitable for assessing systems with intermediate audio quality; d) that Recommendation ITU-R BS.1284 gives no absolute scoring for the assessment of intermediate audio quality; e) that inclusion of appropriate a
14、nd relevant anchors in testing enables stable use of the subjective rating scale; f) that Recommendations ITU-T P.800, ITU-T P.810 and ITU-T P.830 are focused on speech signals in a telephone environment and proved to be not sufficient for the evaluation of audio signals in a broadcasting environmen
15、t; g) that the use of standardized subjective test methods is important for the exchange, compatibility and correct evaluation of the test data; h) that new multimedia services may require combined assessment of audio and video quality; i) that the name MUSHRA is often misused for tests not using re
16、ference and anchors; j) that anchors can affect the test results and it is desirable that anchors resemble the systems artefacts being tested; k) that the introduction of multichannel stereophonic sound systems up to 3/2 channels specified in Recommendation ITU-R BS.775 and the advanced sound system
17、 described in 2 Rec. ITU-R BS.1534-3 Recommendation ITU-R BS.2051, with or without accompanying picture, requires new subjective assessment methods, including the experimental conditions, recommends 1 that the testing and evaluation procedures given in Annex 1 of this Recommendation should be used f
18、or the subjective assessment of intermediate audio quality, further recommends 1 that studies of anchors that have the characteristics of impairments encountered in state-of-the-art audio systems are continued and that this Recommendation be updated to include new anchors as they are appropriate. An
19、nex 1 1 Introduction This Recommendation describes a method for the subjective assessment of intermediate audio quality. This method mirrors many aspects of Recommendation ITU-R BS.1116 and uses the same grading scale as is used for the evaluation of picture quality (i.e. Recommendation ITU-R BT.500
20、). The method, called “MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA)”, has been successfully tested. These tests have demonstrated that the MUSHRA method is suitable for evaluation of intermediate audio quality and gives accurate and reliable results, 2; 4; 3. This Recommendation inc
21、ludes the following sections and attachments: Section 1: Introduction Section 2: Scope, test motivation and purpose of new method Section 3: Experimental design Section 4: Selection of assessors Section 5: Test method Section 6: Attributes Section 7: Test material Section 8: Listening conditions Sec
22、tion 9: Statistical analysis Section 10: Test report and presentation of results Attachment 1 (Normative): Instructions to be given to assessors Attachment 2 (Informative): Guidance notes on user interface design Attachment 3 (Normative): Description of non-parametric statistical comparison between
23、two samples using re-sampling techniques and Monte-Carlo simulation methods Attachment 4 (Informative): Guidance notes for parametric statistical analysis Attachment 5 (Informative): Requirements for optimum anchor behaviours Rec. ITU-R BS.1534-3 3 2 Scope, test motivation and purpose of new method
24、Subjective listening tests are recognized as still being the most reliable way of measuring the quality of audio systems. There are well described and proven methods for assessing audio quality at the top and the bottom quality range. Recommendation ITU-R BS.1116 Methods for the subjective assessmen
25、t of small impairments in audio systems including multichannel sound systems, is used for the evaluation of high quality audio systems having small impairments. However, there are applications where lower quality audio is acceptable or unavoidable. Rapid developments in the use of the Internet for d
26、istribution and broadcast of audio material, where the data rate is limited, have led to a compromise in audio quality. Other applications that may contain intermediate audio quality are digital AM (i.e. digital radio mondiale (DRM), digital satellite broadcasting, commentary circuits in radio and T
27、V, audio on-demand services and audio on dial-up lines). The test method defined in Recommendation ITU-R BS.1116 is not entirely suitable for evaluating these lower quality audio systems 4 because it is poor at discriminating between small differences in quality at the bottom of the scale. Recommend
28、ation ITU-R BS.1284 gives only methods which are dedicated either to the high quality audio range or gives no absolute scoring of audio quality. Other Recommendations, like Recommendations ITU-T P.800, ITU-T P.810 or ITU-T P.830, are focused on subjective assessment of speech signals in a telephone
29、environment. The European Broadcasting Union (EBU) Project Group B/AIM has done experiments with typical audio material as used in a broadcasting environment using these ITU-T methods. None of these methods fulfils the requirement for an absolute scale, comparison with a reference signal and small c
30、onfidence intervals with a reasonable number of assessors at the same time. Therefore, the evaluation of audio signals in a broadcasting environment cannot be done properly by using one of these methods. The revised test method described in this Recommendation is intended to give a reliable and repe
31、atable measure of systems having audio quality which would normally fall in the lower half of the impairment scale used by Recommendation ITU-R BS.1116 2; 4; 3. In the MUSHRA test method, a high quality reference signal is used and the systems under test are expected to introduce significant impairm
32、ents. MUSHRA is to be used for assessment of intermediate quality audio systems. If MUSHRA is used with appropriate content, it is ideal that listener scores should range between 20-80 MUSHRA points. If scores for the majority of test conditions fall in the range of 80-100 it may be true that the re
33、sults of the test are invalid. Likely reasons for the compressed scoring are: use of nave assessors, use of non-critical content, or inappropriate test choice for the encoding algorithms at test. 3 Experimental design Many different kinds of research strategies are used in gathering reliable informa
34、tion in a domain of scientific interest. In the subjective assessment of impairments in audio systems, the most formal experimental methods shall be used. Subjective experiments are characterized firstly by actual control and manipulation of the experimental conditions, and secondly by collection an
35、d analysis of statistical data from listeners. Careful experimental design and planning is needed to ensure that uncontrolled factors which can cause ambiguity in test results are minimized. As an example, if the actual sequence of audio items were identical for all the assessors in a listening test
36、, then one could not be sure whether the judgements made by the assessors were due to that sequence rather than to the different levels of impairments that were presented. Accordingly, the test conditions must be arranged in a way that reveals the effects of the independent factors, and only of thes
37、e factors. In situations where it can be expected that the potential impairments and other characteristics will be distributed homogeneously throughout the listening test, a true randomization can be applied to 4 Rec. ITU-R BS.1534-3 the presentation of the test conditions. Where non-homogeneity is
38、expected this must be taken into account in the presentation of the test conditions. For example, where material to be assessed varies in level of difficulty, the order of presentation of stimuli must be distributed randomly, both within and between sessions. Listening tests need to be designed so t
39、hat assessors are not overloaded to the point of lessened accuracy of judgement. Except in cases where the relationship between sound and vision is important, it is preferred that the assessment of audio systems is carried out without accompanying pictures. A major consideration is the inclusion of
40、appropriate control conditions. Typically, control conditions include the presentation of unimpaired audio materials, introduced in ways that are unpredictable to the assessors. It is the differences between judgements of these control stimuli and the potentially impaired ones that allows one to con
41、clude that the grades are actual assessments of the impairments. Some of these considerations will be described later. It should be understood that the topics of experimental design, experimental execution, and statistical analysis are complex, and that not all details can be given in a Recommendati
42、on such as this. It is recommended that professionals with expertise in experimental design and statistics should be consulted or brought in at the beginning of the planning for the listening test. To enable efficient analysis of and transfer of data between laboratories, the experimental design sha
43、ll be reported. Both, dependent and independent variables should be defined in detail. The number of independent variables will be defined with their associated levels. 4 Selection of assessors Data from listening tests assessing small impairments in audio systems, as in Recommendation ITU-R BS.1116
44、, should come from assessors who have experience in detecting these small impairments. The higher the quality reached by the systems to be tested, the more important it is to have experienced listeners. 4.1 Criteria for selecting assessors Whilst the MUSHRA test method is not intended for assessment
45、 of small impairments, it is still recommended that experienced listeners should be used to ensure the goodness of collected test data. These listeners should have experience in listening to sound in a critical way. Such listeners will give a more reliable result more quickly than non-experienced li
46、steners. It is also important to note that most non-experienced listeners tend to become more sensitive to the various types of artefacts after frequent exposure. An experienced assessor is chosen for his/her ability to carry out a listening test. This ability is to be qualified and quantified in te
47、rms of the assessors Reliability and Discrimination skills within a test, based upon replicate of evaluations, as defined below: Discrimination: A measure of the ability to perceive differences between test items. Reliability: A measure of the closeness of repeated ratings of the same test item. Onl
48、y assessors categorized as experienced assessors for any given test should be included in final data analysis. A number of techniques for performing this analysis of assessors are available. For more information consult Report ITU-R BS.23001. These are based upon at least one replicated rating by ea
49、ch assessor and allow for a qualification and quantification of assessor experience within one experiment. The methods are to be applied either as a pre-screening of assessors within a 1 The expertise gauge (eGauge) method as described in Report ITU-R BS.2300-0 is an example of an implementation of that technique. It is available from http:/www.itu.int/oth/R0A07000036. Rec. ITU-R BS.1534-3 5 pilot experiment or preferably as both pre-screening and part of the main test. A pilot experiment is associated to a series of experiments and comprises a representative set of test sam