1、 I n t e r n a t i o n a l T e l e c o m m u n i c a t i o n U n i o n ITU-T Series P TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU Supplement 27 (01/2017) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Application of ITU-T P.863 and ITU-T P.863.1 for speech
2、 processed by blind bandwidth extension approaches ITU-T P-series Recommendations Supplement 27 ITU-T P-SERIES RECOMMENDATIONS TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Vocabulary and effects of transmission parameters on customer opinion of transmission quality Se
3、ries P.10 Voice terminal characteristics Series P.30 P.300 Reference systems Series P.40 Objective measuring apparatus Series P.50 P.500 Objective electro-acoustical measurements Series P.60 Measurements related to speech loudness Series P.70 Methods for objective and subjective assessment of speech
4、 quality Series P.80 Methods for objective and subjective assessment of speech and video quality Series P.800 Audiovisual quality in multimedia services Series P.900 Transmission performance and QoS aspects of IP end-points Series P.1000 Communications involving vehicles Series P.1100 Models and too
5、ls for quality assessment of streamed media Series P.1200 Telemeeting assessment Series P.1300 Statistical analysis, evaluation and reporting guidelines of quality measurements Series P.1400 Methods for objective and subjective assessment of quality of services other than speech and video Series P.1
6、500 For further details, please refer to the list of ITU-T Recommendations. P series Supplement 27 (01/2017) i Supplement 27 to ITU-T P-series Recommendations Application of ITU-T P.863 and ITU-T P.863.1 for speech processed by blind bandwidth extension approaches Summary Supplement 27 to the ITU-T
7、P-series of Recommendations provides a method for the application of ITU-T P.863 to speech signals processed by a blind bandwidth extension (BBE), which is complementary to the existing procedures given in Recommendation ITU-T P.863.1. When bandwidth extension techniques are used, not only does the
8、reference bandwidth need to be set but ITU-T P.863 also has a limited ability to discriminate small bit rate and bandwidth improvements. These quality differences are clearly distinguishable in subjective tests. For ITU-T P.863 tests, a complementing bandwidth requirement check is needed and detaile
9、d. History Edition Recommendation Approval Study Group Unique ID* 1.0 ITU-T P Suppl. 27 2017-01-19 12 11.1002/1000/13242 Keywords Bandwidth extension, ITU-T P.863, ITU-T P.863.1. * To access the Recommendation, type the URL http:/handle.itu.int/ in the address field of your web browser, followed by
10、the Recommendations unique ID. For example, http:/handle.itu.int/11.1002/1000/11830-en. ii P series Supplement 27 (01/2017) FOREWORD The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications, information and communication technologies
11、 (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunica
12、tion Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of i
13、nformation technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this publication, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recognized operating age
14、ncy. Compliance with this publication is voluntary. However, the publication may contain certain mandatory provisions (to ensure, e.g., interoperability or applicability) and compliance with the publication is achieved when all of these mandatory provisions are met. The words “shall“ or some other o
15、bligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the publication is required of any party. INTELLECTUAL PROPERTY RIGHTSITU draws attention to the possibility that the practice or implementatio
16、n of this publication may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the publication development process. As of the da
17、te of approval of this publication, ITU had not received notice of intellectual property, protected by patents, which may be required to implement this publication. However, implementers are cautioned that this may not represent the latest information and are therefore strongly urged to consult the
18、TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2017 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. P series Supplement 27 (01/2017) iii Table of Contents Page 1 Scope . 1 2 References . 1 3 Definitio
19、ns 1 3.1 Terms defined elsewhere 1 3.2 Terms defined in this Supplement 1 4 Abbreviations and acronyms 1 5 Conventions 1 6 BBE and Objectives 2 7 BBE quality evaluation . 2 7.1 Challenge: Bandwidth vs quality 2 7.2 Defining bandwidth 3 7.3 Subjective and objective evaluation methods . 4 7.4 Proposed
20、 BBE objective evaluation methodology . 4 8 BBE algorithm evaluation 4 8.1 Algorithms used 4 8.2 Objective performance . 4 8.3 Subjective performance 5 8.4 Effect of high-band attenuation on subjective performance . 7 8.5 Summary . 8 Bibliography. 9 iv P series Supplement 27 (01/2017) Introduction R
21、ecently the industry has started to move from narrowband speech coders (NB) to wideband (WB) or super-wideband (SWB) coders. However, until complete coverage has been achieved, a significant proportion of calls will still use legacy narrowband. Even then, calls from landlines will likely still be na
22、rrowband for some time. Blind bandwidth extension (BBE) technology aims to solve this problem, by transforming NB speech into WB or SWB speech. A requisite to successful deployment of BBE technology is having a good evaluation methodology. In this document, we propose that ITU-T P.863 in conjunction
23、 with a bandwidth requirement is a suitable methodology for BBE performance evaluation. P series Supplement 27 (01/2017) 1 Supplement 27 to ITU-T P-series Recommendations Application of ITU-T P.863 and ITU-T P.863.1 for speech processed by blind bandwidth extension approaches 1 Scope Supplement 27 t
24、o the ITU-T P-series of Recommendations provides a method for the application of ITU-T P.863 to speech signals processed by a blind bandwidth extension (BBE), which is complementary to the existing procedures given in ITU-T P.863.1. This Supplement provides an evaluation of speech quality using ITU-
25、T P.863 for bandwidth extension, when the bandwidth speech under evaluation is wider than the original speech content and cannot be directly related to the input signal. 2 References ITU-T P.501 Recommendation ITU-T P.501 (2012), Test signals for use in telephonometry. ITU-T P.800 Recommendation ITU
26、-T P.800 (1996), Methods for subjective determination of transmission quality. ITU-T P.863 Recommendation ITU-TP.863 (2014), Perceptual objective listening quality assessment. ITU-T P.863.1 Recommendation ITU-T P.863.1 (2014), Application guide for Recommendation ITU-T P.863. 3 Definitions 3.1 Terms
27、 defined elsewhere None. 3.2 Terms defined in this Supplement None. 4 Abbreviations and acronyms This Supplement uses the following abbreviations and acronyms: ACR Absolute Category Rating BBE Blind Bandwidth Extension DCR Degradation Category Rating HD High Definition NB Narrowband WB Wideband SWB
28、Superwideband 5 Conventions None. 2 P series Supplement 27 (01/2017) 6 BBE and objectives Blind bandwidth extension (BBE) technology aims to transform NB speech into WB or SWB speech. For simplicity, the focus of this Supplement is the WB case only. Typically using some form of either spectral foldi
29、ng or statistical modelling, the 4-8 kHz part of a speech signal is predicted from the 0-4 kHz part, to generate a signal which has the general characteristics of wideband speech b-Carl, b-Pulakka. While perfect prediction cannot be expected reasonably good quality speech can be obtained. There are
30、two ways to view the objectives of BBE. It can either be seen as a way to improve NB, or as a way to make NB closer to WB. While these may seem like very similar objectives, in practice they are quite different, and apply to different scenarios. The first case is that of a network that is currently
31、NB only, while the second case is encountered when a network has a mix of NB and WB calls. Both of these scenarios are encountered across mobile phone networks, but as networks move towards deploying more HD voice codecs, the second scenario will become more common. The user will likely experience a
32、 mix of wideband and narrowband calls, or possibly even experience both bandwidths during the same call. The lack of uniformity of experience will be a problem, as some calls will appear muffled or of lower quality, which in turn will lead to user dissatisfaction. 7 BBE quality evaluation 7.1 Challe
33、nge: bandwidth vs quality BBE algorithms are not perfect and the process of predicting a high band introduces artefacts. There is a trade-off between bandwidth of the signal and overall noisiness of the BBE extended speech, which can be controlled easily by attenuating the overall high-band energy.
34、This can lead to confusion during comparative evaluations, where listeners might prefer an algorithm because it shows fewer artefacts when this is in fact due to it having less high-band energy, rather than being intrinsically a better algorithm. Therefore, it is important that different BBE algorit
35、hms are compared at the same operating point. This is illustrated in Figure 1. Figure 1 Bandwidth vs absence of artefacts trade-off In Figure 1, two BBE algorithms are represented. Algorithm-2 is clearly better than Algorithm-1. This is easily seen when fixing one dimension, either bandwidth or qual
36、ity: Algorithm-2 is superior in the other dimension. The problem occurs when comparing Algorithm-1 at low bandwidth (the P series Supplement 27 (01/2017) 3 operating point furthest to the left), to Algorithm-2 at high bandwidth (the operating point furthest to the right). In this situation, Algorith
37、m-1 has fewer artefacts than Algorithm-2, even though the algorithm itself is not as good, only the operating points are different. This shows the necessity of considering both dimensions when comparing BBE algorithms. Additionally, as bandwidth is reduced, all BBE algorithms converge to the input n
38、arrowband signal, and are indistinguishable. Therefore, for maximum resolution, it is best to evaluate BBE algorithms at a high bandwidth, even if it might not be the bandwidth at which the algorithm is intended to be used for deployment. 7.2 Defining bandwidth Frequency response of BBE technologies
39、 is undefined, as the predicted high band is not a function of the original high band. This can be resolved by defining a reference wideband input. The speech material defined in ITU-T P.501 is a good choice since it is broadly used across the wireless industry for testing compliance for voice servi
40、ces. Figure 2 Frequency mask for bandwidth estimation The 3GPP WB Rx mask defined in b-3GPP TS 26.131 is a good mask to use with WB BBE, as it ensures that the bandwidth of the BBE output is similar to that of a coded, wideband output meeting the same mask. However, to allow for a different operatin
41、g point at lower bandwidth, a series of masks can be defined as modifications to the 3GPP WB Rx mask wherein its lower limit is relaxed by N dB in the high band. This is illustrated in Figure 2. Note that the 3.3-5 kHz transition-band has been left undefined, to allow for classic frequency extension
42、 techniques such as spectral folding, which can lead to a frequency dip around 4 kHz without adversely affecting speech quality. 4 P series Supplement 27 (01/2017) 7.3 Subjective and objective evaluation methods The most commonly used techniques for subjective quality evaluation of vocoders are the
43、ITU-T P.800 DCR (degradation category rating) and ACR (absolute category rating) tests ITU-T P.800. Both are suitable for BBE evaluation, the main difference being that DCR measures degradation from the WB reference input, whereas ACR does not present a reference. Interestingly, these two cases matc
44、h the two deployment scenarios described above, with DCR corresponding to the NB/WB mixed network case, and ACR to the NB-only case. However, subjective tests are costly and time-demanding. An increasingly popular alternative is to use objective evaluation methods, in particular ITU-T P.863, also kn
45、own as POLQA ITU-T P.863. While it is not perfect, ITU-T P.863 claims to handle a wide range of input degradations, and when used appropriately, can give a good indication of subjective speech quality ITU-T P.863.1. Additionally, it is already widely used in the industry for speech quality evaluatio
46、n, often with ITU-T P.501 source material. For BBE, the source material should be transcoded by an appropriate narrowband vocoder. If cellular wireless transmission is under consideration, this most commonly means the 3GPP AMR codec operating at 12.2 kbps b-3GPP TS 26.090, as this is the narrowband
47、speech codec used in the vast majority of todays mobile communication networks. 7.4 Proposed BBE objective evaluation methodology We propose the following objective evaluation methodology for BBE. Bandwidth requirement: Measure bandwidth by testing the response to verify whether it passes a frequenc
48、y mask derived from the 3GPP WB Rx mask, as per Figure 2, and using ITU-T P.501 British English speech material as the input. We recommend using N=0 dB (i.e., no relaxation of the mask) as the operating point. Quality requirement: Measure quality using ITU-T P.863 with ITU-T P.501 British English co
49、ded by AMR at 12.2 kbps. A good quality reference is the ITU-T P.863 MOS-LQO score of the input NB signal, up-sampled to 16 kHz. Note that commercial implementations of ITU-T P.863 have a number of options and versions. In this document, the so-called “POLQA v2.4“, in high-accuracy mode, and a WB reference are used. Other options change the absolute ITU-T P.863 MOS-LQO scores, but generally have little impact on the relative scores, and do not change the overall conclusions. 8 BBE algorithm evaluation 8.1
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1