1、 ETSI TR 102 506 V1.4.1 (2011-08)Technical Report Speech and multimedia Transmission Quality (STQ);Estimating Speech Quality per CallETSI ETSI TR 102 506 V1.4.1 (2011-08) 2Reference RTR/STQ-00174m Keywords quality, speech ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33
2、4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88 Important notice Individual copies of the present document can be downloaded from: http:/www.etsi.org The present document may be made availab
3、le in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a sp
4、ecific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http:/portal.etsi.org/tb/status/status.asp If you find erro
5、rs in the present document, please send your comment to one of the following services: http:/portal.etsi.org/chaircor/ETSI_support.asp Copyright Notification No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in al
6、l media. European Telecommunications Standards Institute 2011. All rights reserved. DECTTM, PLUGTESTSTM, UMTSTMand the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. 3GPPTM and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organiz
7、ational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association. ETSI ETSI TR 102 506 V1.4.1 (2011-08) 3Contents Intellectual Property Rights 5g3Foreword . 5g31 Scope 6g32 References 6g32.1 Normative references . 6g32.2 Informative references 6g33 Definitions and a
8、bbreviations . 7g33.1 Definitions 7g33.2 Abbreviations . 7g34 General . 7g35 Call properties 8g35.1 Call structure 8g35.2 Call length 8g35.2.1 Length of utterance (sample) 8g35.2.2 Number of utterances (samples) . 8g35.3 Call design 8g36 Call quality on a per sample basis 9g36.1 Evaluation of the sa
9、mples . 9g36.2 Mathematical modelling of the call quality 9g36.2.1 Impact of bad samples towards the end of a call 10g36.2.2 Impact of the a single very bad sample . 10g36.2.3 Applicability of the mathematical model 10g36.2.4 Validation of the formula 10g36.2.5 Calculation Example . 11g37 Conclusion
10、 12g3Annex A: Empirical Study from March 2002 on the perceived call quality: PESQ-mobil . 13g3A.1 Test concept and speech recordings . 13g3A.1.1 Test description of the overall project 13g3A.2 Design of an auditory test methodology to assess the speech material 14g3A.2.1 Structure of the quality ass
11、essment 14g3A.2.2 Simulation of a conversation 14g3A.2.3 Assessment on an individual per-sample basis . 15g3A.2.4 Distortion types for the voice transmission 15g3A.2.5 Structure of the speech material . 16g3A.2.6 Quality of the speech material 16g3A.2.7 Results 16g3A.3 Modelling the overall quality
12、mathematically on basis of the MOS-values 17g3A.3.1 Modelling of Speech Quality by averaging per-sample scores 17g3A.3.2 Modelling of Speech Quality by consideration of the “recency effect“ . 18g3A.3.3 Modelling of Speech Quality with consideration of a bad sample . 19g3A.4 Assessment of the speech
13、material by ITU-T Recommendation P.862 . 20g3A.4.1 Assessment of the separated speech parts 20g3A.4.2 Result presentation . 21g3A.4.3 Usage of the model with the ITU-T Recommendation P.862 results . 22g3A.5 The rating of the samples . 23g3A.5.1 Rating of the calls . 23g3A.5.2 Rating of the utterance
14、s 24g3ETSI ETSI TR 102 506 V1.4.1 (2011-08) 4Annex B: Empirical Study on the perceived call quality with English samples (EricssonAB, 2007) . 26g3B.1 Introduction 26g3B.2 Test design 26g3B.3 Test results 26g3B.3.1 Results for 60 seconds calls 27g3B.3.2 Results for 120 seconds calls 27g3B.3.3 Results
15、 for the utterances . 28g3B.3.4 Correlation Between MOS and P.862.1 for the individual utterances 30g3B.4 Call profiles 31g3B.4.1 Quality profiles for 120 seconds calls 31g3B.4.2 Quality profiles for 60 seconds calls 32g3Annex C: Study on the perceived call quality with German samples (T-Labs, 2007)
16、 . 34g3C.1 Introduction 34g3C.2 Test Design . 34g3C.2.1 Material 34g3C.2.2 Subjects 34g3C.2.3 Procedure 35g3C.2.4 Results 35g3C.3 Detailed test results 60 seconds calls 36g3C.3.1 Rating of the calls . 36g3C.3.2 Rating of the utterances 37g3C.4 Detailed test results 120 seconds calls 40g3C.4.1 Rating
17、 of the calls . 40g3C.4.2 Rating of the utterances 41g3History 44g3ETSI ETSI TR 102 506 V1.4.1 (2011-08) 5Intellectual Property Rights IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is public
18、ly available for ETSI members and non-members, and can be found in ETSI SR 000 314: “Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards“, which is available from the ETSI Secretariat. Latest updates are available on the ETSI W
19、eb server (http:/ipr.etsi.org). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may
20、become, essential to the present document. Foreword This Technical Report (TR) has been produced by ETSI Technical Committee Speech and multimedia Transmission Quality (STQ). ETSI ETSI TR 102 506 V1.4.1 (2011-08) 61 Scope The present document proposes a way to model measurement results on a per samp
21、le basis that allow to estimate the perceived end-to-end speech quality per call for narrowband circuit switched voice services in mobile networks. It focuses on speech (listening) quality of a voice call. Speech quality per call calculation determines the speech quality separate per each direction
22、of the call. Conversational properties such as talker quality, round trip and other related metrics are not considered. Speech Quality of video telephony is not considered either. The scenario is focussing on test signals between 60 seconds and 120 seconds in duration with alternating speech/silence
23、 periods as described in clause 5. The presented model is based on three studies but may not generalize to other call scenarios than those used in the underlying studies. Throughout the present document where ITU-T Recommendation P.862.1 i.2 (or ITU-T Recommendation P.862 i.1) is quoted the same app
24、lies to all measurements of listening quality. This can be listening quality scores gained by auditory tests (MOS-LQS) or objective measurements predicting MOS-LQO according to ITU-T Recommendation P.800.1 i.3 covering the relevant network distortions and speech processing components in their scope.
25、 2 References References are either specific (identified by date of publication and/or edition number or version number) or non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the reference document (including any amendments) appl
26、ies. Referenced documents which are not found to be publicly available in the expected location might be found at http:/docbox.etsi.org/Reference. NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee their long term validity. 2.1 Normative re
27、ferences The following referenced documents are necessary for the application of the present document. Not applicable. 2.2 Informative references The following referenced documents are not necessary for the application of the present document but they assist the user with regard to a particular subj
28、ect area. i.1 ITU-T Recommendation P.862: “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs“. i.2 ITU-T Recommendation P.862.1: “Mapping function for transforming P.862 raw result scores t
29、o MOS-LQO“. i.3 ITU-T Recommendation P.800.1: “Mean Opinion Score (MOS) terminology“. i.4 ETSI TS 102 250 (all parts): “Speech and multimedia Transmission Quality (STQ); QoS aspects for popular services in mobile networks“. i.5 ITU-T Recommendation P.862.3: “Application guide for objective quality m
30、easurement based on Recommendations P.862, P.862.1 and P.862.2“. i.6 ITU-T Recommendation P.800: “Methods for subjective determination of transmission quality“. i.7 CENELEC EN 60645-2:1997: “Audiometers - Part 2: Equipment for speech audiometry“. ETSI ETSI TR 102 506 V1.4.1 (2011-08) 7i.8 “Ergebnisb
31、ericht (Study) Berkom PESQ-mobil“ (in German), J. Berger, T-Systems. i.9 ETSI TR 102 506 (V1.1.1): “Speech Processing, Transmission and Quality Aspects (STQ); Estimating Speech Quality per Call“. 3 Definitions and abbreviations 3.1 Definitions For the purposes of the present document, the following
32、terms and definitions apply: listening quality: quality as perceived by user in a listening situation perceived quality: quality as perceived by a human user speech quality per call: listening quality as perceived by a user (at the end) of a conversational call 3.2 Abbreviations For the purposes of
33、the present document, the following abbreviations apply: ACR Absolute Category Rating AMR Adaptive Multi Rate EFR Enhanced Full Rate FR Full Rate HR Half RateIRS Intermediate Reference System MOS Mean Opinion Score NOTE: Commonly used term for quality assessment. MOS-LQO MOS-Listening Quality from O
34、bjective testing MOS-LQS MOS-Listening Quality from auditory tests (Subjective) SpQ-C Speech (listening) Quality on Call basis UMTS Universal Mobile Telecommunications System VoIP Voice over IP 4 General The established way of measuring the speech quality is the measurement on a per sample basis. Mu
35、ch standardization work has been done by the ITU-T with the P.862 series of documents. Using that established way and taking advantage of the data acquired in that fashion one can seek to estimate the perceived speech quality of a call. Current models of averaging over a large amount of single speec
36、h samples do not necessarily paint an accurate picture of the customer satisfaction. Since a bad sample can be outweighed by a couple of good samples. Averaging over the calls mitigates the problem but still suffers from the shortcoming that a number of good samples may outweigh a very bad sample. O
37、n the other hand threshold models that regard a call fair or poor on the basis of one or two degraded samples do not take the number of good or excellent samples into account. Models where a certain percentage of the samples need to be degraded to rate the call as bad disregards the temporal structu
38、re of the call and the relative timing of the degradation towards the end. It is worthwhile to model the measurement results to obtain a call quality value that allows understanding the impact of varying speech quality during a conversation. ETSI ETSI TR 102 506 V1.4.1 (2011-08) 85 Call properties F
39、or the determination of the call properties like call length and the samples specifics it can be drawn on existing specification like ITU-T Recommendation P.862 i.1 and TS 102 250 i.4. On that basis a reference speech quality sensitive voice call can be characterized. The standard call length for in
40、strumental voice quality testing is defined in TS 102 250-5 i.4 and the sample characteristics and evaluation are defined in ITU-T Recommendation P.862 i.1 and ITU-T Recommendation P.862.3 i.5. For the structure of the call the definition needs to be done. 5.1 Call structure Calls, be they mobile or
41、iginated, mobile terminated or mobile to mobile can be divided up into different groups. Short calls of a couple of seconds where there is an announcement like pre paid account statements or voice boxes or wrong destination and conversations where the parties exchange a couple of utterances. Assumed
42、 the listening quality sensitive calls are the group where meaningful utterances are exchanged over a stretch of time, voicemail and speed dials can be excluded from the consideration. The “typical“ call is a dialog-like conversation, which is in line with the empirical findings. In an idealized dia
43、log the utterances are exchanged and distributed evenly in length and frequency. On each side a certain period of speech activity is followed by silence for the same length of time. Since the call quality on sample basis is rated for each side independently it is sufficient in an instrumental or sub
44、jective realization to feed one side with the required sample pattern. 5.2 Call length The length of the call should give room for a couple of utterances (samples). The call length recommended in TS 102 250-5 i.4 is 120 seconds which is sufficient for this requirement. In fact the average call lengt
45、h is well below this time. However if calls like those to the mailbox, to pre-paid account, far end voice boxes or wrong numbers are excluded from that calculation the average time of calls goes up considerably. However for practical purposes it is desirable to use call lengths that are considerably
46、 shorter that 120 seconds. The studies in annexes B and C provide results for calls with a length of 60 seconds. 5.2.1 Length of utterance (sample) The application guideline for objective speech measurement and the construction of samples for objective quality measurements is ITU-T Recommendation P.
47、862.3 i.5. The typical sample of measurement systems has a length from 5 seconds to 12 seconds with a speech activity of maximum 80 %. Such a sample typically contains leading and trailing silence and in case of multiple sentences also silence in between. These individual samples and their ratings a
48、re the basis of the call quality assessment. Therefore the speech activity part of the call consists of these samples. 5.2.2 Number of utterances (samples) Depending on the length of the call in connection with length of the individual utterance it takes from 5 to 12 utterances and silence pairs to
49、fill the different call lengths. From empirical evidence we know that a typical conversational call contains around 4 utterances from each side so that 5 recurrences of the speech and silent pair can be recommended. Considering that these values are applicable for short calls, longer calls can accommodate up to 12 speech and silence pairs with an individual sample length of 5 seconds. 5.3 Call design The conversational call that is to be rated to estimate the call quality should consist of alternating phases of speech activity and sile
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1