1、 AMERICAN NATIONAL STANDARD FOR TELECOMMUNICATIONS ATIS-0100518.1998(R2013) Objective Measurement of Telephone Band Speech Quality Using Measuring Normalizing Blocks (MNBs) As a leading technology and solutions development organization, ATIS brings together the top global ICT companies to advance th
2、e industrys most-pressing business priorities. Through ATIS committees and forums, nearly 200 companies address cloud services, device solutions, emergency services, M2M communications, cyber security, ehealth, network evolution, quality of service, billing support, operations, and more. These prior
3、ities follow a fast-track development lifecycle from design and innovation through solutions that include standards, specifications, requirements, business use cases, software toolkits, and interoperability testing. ATIS is accredited by the American National Standards Institute (ANSI). ATIS is the
4、North American Organizational Partner for the 3rd Generation Partnership Project (3GPP), a founding Partner of oneM2M, a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications sectors, and a member of the Inter-American Telecommunication Com
5、mission (CITEL). For more information, visit. AMERICAN NATIONAL STANDARD Approval of an American National Standard requires review by ANSI that the requirements for due process, consensus, and other criteria for approval have been met by the standards developer. Consensus is established when, in the
6、 judgment of the ANSI Board of Standards Review, substantial agreement has been reached by directly and materially affected interests. Substantial agreement means much more than a simple majority, but not necessarily unanimity. Consensus requires that all views and objections be considered, and that
7、 a concerted effort be made towards their resolution. The use of American National Standards is completely voluntary; their existence does not in any respect preclude anyone, whether he has approved the standards or not, from manufacturing, marketing, purchasing, or using products, processes, or pro
8、cedures not conforming to the standards. The American National Standards Institute does not develop standards and will in no circumstances give an interpretation of any American National Standard. Moreover, no person shall have the right or authority to issue an interpretation of an American Nationa
9、l Standard in the name of the American National Standards Institute. Requests for interpretations should be addressed to the secretariat or sponsor whose name appears on the title page of this standard. CAUTION NOTICE: This American National Standard may be revised or withdrawn at any time. The proc
10、edures of the American National Standards Institute require that action be taken periodically to reaffirm, revise, or withdraw this standard. Purchasers of American National Standards may receive current information on all standards by calling or writing the American National Standards Institute. No
11、tice of Disclaimer -3 dB at 3400 Hz), the same as shown in ITU-T Recommendation G.712 for PCM-derived channels between two 2-wire analog interfaces. ATIS-0100518.1998 2 Table 1 Relationship of coding technologies, experimental factors and applications to this standard Test factors Note speech input
12、levels to a codec 1 listening levels in subjective experiments 2 talker dependencies 1 multiple simultaneous talkers 2 transmission channel errors 1 bit rates if a codec has more than one bit rate mode 1 transcodings 1 bit-rate mismatching between an encoder and a decoder if a codec has more than on
13、e bit rate mode 2 environmental noise in the sending side 2 network information signals as input to a codec 2 music as input to a codec 2 delay 3 short-term time warping of audio signal 2 long-term time warping of audio signal 2 temporal clipping of speech 2 amplitude clipping of speech 2 Coding tec
14、hnologies waveform 1 CELP and hybrids 4 kbit/s 1 CELP and hybrids 4 kbit/s 1 VOCODERs 1 speech activity detectors/silence suppression systems 2 other coders 1 Applications coder optimization 1 coder evaluation 1 coder selection 2 network planning 4 live network testing 5 in-service non-intrusive mea
15、surement devices 3 NOTES 1) The objective measure has demonstrated acceptable accuracy in the presence of this variable. 2) Insufficient information is available about the accuracy of the objective measure with regard to this variable. 3) The objective measure is known to provide inaccurate predicti
16、ons when used in conjunction with this variable, or are otherwise not intended to be used with this variable. 4) With caution, the objective measure might be used for some network planning purposes. The reader should note that there are important factors in network planning to which this ANS is not
17、applicable (see the “Test factors” section of this table). 5) With caution, the objective measure might be used for some live network testing. The reader should note that there may be factors or technologies in a live network connection to which this ANS is not applicable (see the “Test Factors” sec
18、tion of this table). Note that this document contains sufficient information to implement this algorithm in a computer programming language. Implementations can be validated by using information available from ftp:/ftp.its.bldrdoc.gov/dist/voice/verify.zip. ATIS-0100518.1998 3 32 Normative reference
19、s The following standards contain provisions that, through reference in this text, constitute provisions of this American National Standard. At the time of publication, the editions indicated were valid. All references are subject to revision, and parties to agreements based on this American Nationa
20、l Standard are encouraged to in-vestigate the possibility of applying the most recent edition of the standards listed below. ANSI T1.801.04-1997, Telecommunications Multimedia Communications Delay, Synchronization, and Frame Rate ITU-T Recommendation G.712 (11/96) Transmission performance characteri
21、stics of pulse code modulation channels2)ITU-T Recommendation P.800 (05/1996) Methods for subjective determination of transmission quality2)ITU-T Recommendation P.830 (09/1995) Subjective performance assessment of telephone-band and wide-band digital codecs2)3 Abbreviations For the purpose of this A
22、NS, the following abbreviations are used: ACR Absolute Category Rating AD Auditory Distance ANS American National Standard ANSI American National Standards Institute CELP Code Excited Linear Prediction DCR Degradation Category Rating DMOS Degradation Mean Opinion Score DUT Device Under Test FMNB Fre
23、quency Measuring Normalizing Block GoB Good or Better ITU-T International Telecommunications Union Telecommunication Standardization Sector MOS Mean Opinion Score MNB Measuring Normalizing Block PoW Poor or Worse TMNB Time Measuring Normalizing Block 4 Definitions For the purpose of this ANS, the fo
24、llowing definitions apply: Bark: a perception-based unit of frequency equivalent to the width of a critical band. As fre-quency increases, the width of the critical hearing band (in Hertz) increases. On the Bark scale, equal frequency intervals are of equal perceptual importance. _ 2) Available from
25、 the American National Standards Institute, 11 West 42nd Street, New York, NY 10036. ATIS-0100518.1998 4 reference vector: audio vector used as input, or source, for a device under test. test vector: output vector from device under test, compared against the reference vector in the measurement proce
26、ss. 5 Conventions Subjective evaluation of speech codecs may be conducted using listening-only or conversational methods of subjective testing. For practical reasons, listening-only tests are the only feasible method of subjective testing during the development of speech codecs, when a real-time imp
27、lementation of the codec is not available. This ANS defines an objective measurement technique for estimating subjective quality obtained in listening-only tests. 6 Summary of objective measurement procedure Objective quality measurement of speech codecs requires a number of steps: 1) Preparation of
28、 reference vectors, i.e., recording of talkers and/or generation of the artificial voices con-forming to ITU-T Recommendation P.50; 2) Selection of experimental parameters that will exercise the salient features of the codec and are able to be tested by objective measurement; 3) Production of refere
29、nce and test speech vectors; 4) Calculation of the objective speech quality based on measuring normalizing blocks (MNBs) using refer-ence and test speech vectors; 5) Transformation from the objective quality scale to the subjective quality scale, if necessary; 6) Analysis of results. Steps 1, 2, 4,
30、5, and 6 are discussed below. 7 Source speech material preparation Reference vectors for objective measurement may be real voices or the artificial voices specified in ITU-T Rec-ommendation P.50, depending on the goals of the experiment. Since the artificial voices defined in ITU-T Recommendation P.
31、50 reproduce the mean characteristics of human speech over various languages, they are useful in objectively estimating the mean subjective quality of a codec over these languages. When the talker-dependency of a codec or the performance of a codec for particular lan-guages is concerned, it is recom
32、mended that real voices be used. In either case, no environmental noise should be added. 7.1 Real voices When real voices are used in objective measurement, they should be produced, recorded, and level-equalized in accordance with section 7 of ITU-T Recommendation P.830. It is recommended that a min
33、imum of two male talkers and two female talkers should be used for each testing condition. If talker dependency is to be tested as a factor in its own right, it is recommended that more talkers be used as follows: - 8 male; - 8 female; - 8 children. ATIS-0100518.1998 5 57.2 Artificial voices When th
34、e artificial voices conforming to ITU-T Recommendation P.50 are used in objective measurement, it is recommended that both male and female artificial voices be used. These vectors should be passed through a filter with appropriate frequency characteristics to simulate sending frequency characteristi
35、cs of a telephone handset, and level-equalized in the same manner as real voices (see ITU-T Recommendation P.830). 8 Selection of experimental parameters To demonstrate the performance of a codec, the effects of various quality factors on the performance of the codec should be examined. ITU-T Recomm
36、endation P.830 provides guidance on subjectively assessing the following quality factors: 1) speech input levels to a codec; 2) listening levels in subjective experiments; 3) talkers (including multiple simultaneous talkers); 4) errors in the transmission channel between an encoder and a decoder; 5)
37、 bit rates if a codec has more than one bit-rate mode; 6) transcodings; 7) bit-rate mismatching between an encoder and a decoder if a codec has more than one bit-rate mode; 8) environmental noise in the sending side; 9) network information signals as input to a codec; 10) music as input to a codec.
38、Note that objective measurement for quality factors other than those specifically noted as applicable in this standard (see Table 1) is still under study. Therefore, these factors should be measured only after the accuracy of an objective measure is verified in conjunction with subjective tests conf
39、orming to ITU-T Recommendation P.830. In addition to the codec conditions, ITU-T Recommendation P.830 recommends the use of reference conditions in subjective tests. These conditions are necessary to facilitate the comparison of subjective test results from dif-ferent laboratories or from the same l
40、aboratory at different times. Also, when expressing the objective test results in terms of equivalent-Q values, reference conditions using the narrow-band Modulated Noise Reference Unit (MNRU) as specified in ITU-T Recommendation P.810 should be tested. Note that including other standard codecs such
41、 as G.711 64-kbit/s PCM, G.726 48-, 32-, 24- and 16-kbit/s ADPCM, G.728 16-kbit/s LD-CELP, and G.729 8-kbit/s CS-ACELP as well as MNRU in objective quality measurement tests may help demonstrate the relative performance of the codec under test and standardized codecs. Detailed explanations of these
42、experimental parameters are found in ITU-T Recommendation P.830. 9 Computation of the objective measure This clause describes the computation of an objective measure based on measuring normalizing blocks (MNBs). MNBs were developed in response to the observations that listeners adapt and react diffe
43、rently to spectral de-viations that span different time and frequency scales. Thus, for the speech quality estimation application, maximal perceptual consistency over a wide range of distortion types requires a family of analyses at multiple frequency and time scales. As spectral deviations are meas
44、ured, the deviations at one scale must be removed so that they are not counted again as part of the deviations at other scales. It is also observed that working from larger to smaller scales is most likely to emulate listeners patterns of adaptation and reaction to spectral deviations. This observat
45、ion has led to a hierarchical structure of MNBs. Two types of measuring normalizing blocks are considered here. The first is the time measuring normalizing block (TMNB) and the second is the frequency measuring normalizing block (FMNB). Each of these blocks takes per-ATIS-0100518.1998 6 ceptually tr
46、ansformed reference (R(t,f) and test (T(t,f) signals as inputs and returns them and a set of meas-urements as outputs. These two building blocks are defined by Figures 1 and 2, respectively. The TMNB inte-grates over some frequency scale, then measures differences and normalizes the test signal at m
47、ultiple times. Finally, the positive and negative portions of the measurements are integrated over time. In an FMNB the converse is true. An FMNB integrates over some time scale, then measures differences and normalizes the test signal at multiple frequencies. Finally, the positive and negative port
48、ions of the measurements are integrated over fre-quency. By design, both types of MNBs are idempotent. This important property is illustrated in Figure 3 and simply means that a second pass through a given MNB will not further alter the test signal, and that second pass will result in a measurement
49、vector of zeros. The idempotency of MNBs allows them to be cascaded and yet still measure the deviation at a given time or frequency scale once and only once. In order to measure spectral deviations at multiple time and frequency scales, a hierarchical structure of TMNBs and FMNBs, operating at decreasing scales has been formed (Figure 4). When used as a distance measure in conjunction with a perceptual transformation (described below), this structure appears to do a good job of emu-lating listeners patterns of adaptation and reaction to spectral deviations.