1、 International Telecommunication Union ITU-T P.1201.2TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2012) SERIES P: TERMINALS AND SUBJECTIVE AND OBJECTIVE ASSESSMENT METHODS Models and tools for quality assessment of streamed media Parametric non-intrusive assessment of audiovisual media stream
2、ing quality Higher resolution application area Recommendation ITU-T P.1201.2 ITU-T P-SERIES RECOMMENDATIONS TERMINALS AND SUBJECTIVE AND OBJECTIVE ASSESSMENT METHODS Vocabulary and effects of transmission parameters on customer opinion of transmission quality Series P.10 Voice terminal characteristi
3、cs Series P.30 P.300 Reference systems Series P.40 Objective measuring apparatus Series P.50 P.500 Objective electro-acoustical measurements Series P.60 Measurements related to speech loudness Series P.70 Methods for objective and subjective assessment of speech quality Series P.80 P.800 Audiovisual
4、 quality in multimedia services Series P.900 Transmission performance and QoS aspects of IP end-points Series P.1000 Communications involving vehicles Series P.1100 Models and tools for quality assessment of streamed media Series P.1200Telemeeting assessment Series P.1300 Statistical analysis, evalu
5、ation and reporting guidelines of quality measurements Series P.1400 For further details, please refer to the list of ITU-T Recommendations. Rec. ITU-T P.1201.2 (10/2012) i Recommendation ITU-T P.1201.2 Parametric non-intrusive assessment of audiovisual media streaming quality Higher resolution appl
6、ication area Summary Recommendation ITU-T P.1201.2 specifies the algorithmic model for the higher resolution (HR) application area of Recommendation ITU-T P.1201. The ITU-T P.1201 series of Recommendations specifies models for monitoring the audio, video and audiovisual quality of IP-based video ser
7、vices based on packet-header information. The higher resolution application area of the ITU-T P.1201.2 part of ITU-T P.1201 can be applied to the monitoring of performance and quality of experience (QoE) of video services such as Internet Protocol television (IPTV). The algorithm for the lower resol
8、ution (LR) case is specified in Recommendation ITU-T P.1201.1. See Recommendation ITU-T P.1201 for details and respective application ranges. This Recommendation contains an electronic attachment containing a set of test vectors and test files for verification of compliance. History Edition Recommen
9、dation Approval Study Group 1.0 ITU-T P.1201.2 2012-10-14 12 1.1 ITU-T P.1201.2 (2012) Amd.1 2013-05-14 12 ii Rec. ITU-T P.1201.2 (10/2012) FOREWORD The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications, information and communicat
10、ion technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The Wor
11、ld Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In
12、 some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recog
13、nized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure, e.g., interoperability or applicability) and compliance with the Recommendation is achieved when all of these mandatory provisions are met. The w
14、ords “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility th
15、at the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommenda
16、tion development process. As of the date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementers are cautioned that this may not represent the latest information and are
17、therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2013 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. Rec. ITU-T P.1201.2 (10/2012) iii Table of Contents Page 1 S
18、cope 1 2 References. 1 3 Definitions 2 4 Abbreviations and acronyms 2 5 Conventions 2 6 ITU-T P.1201.2 (HR) algorithm description 2 6.0 Introduction 2 6.1 Packet acquisition and side information . 4 6.2 Frame-based parameter extraction . 7 6.3 Sequence-based parameter extraction 19 6.4 Quality modul
19、es 40 7 Model compliance 44 7.1 Test files . 44 Rec. ITU-T P.1201.2 (10/2012) 1 Recommendation ITU-T P.1201.2 Parametric non-intrusive assessment of audiovisual media streaming quality Higher resolution application area 1 Scope This Recommendation1specifies the algorithmic model for the higher resol
20、ution (HR) application area of ITU-T P.1201. ITU-T P.1201 specifies models for monitoring the audio, video and audiovisual quality of IP-based video services based on packet-header information. The higher resolution application area of the ITU-T P.1201.2 part of ITU-T P.1201 can be applied to the mo
21、nitoring of performance and quality of experience (QoE) of video services such as IPTV. The algorithm for the lower resolution case is specified in ITU-T P.1201.1. See ITU-T P.1201 for details and respective application ranges. 2 References The following ITU-T Recommendations and other references co
22、ntain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; users of this Recommendation are therefore encouraged to investigate
23、the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does not give it, as a stand-alone document, the status of
24、a Recommendation. ITU-T H.222.0 Recommendation ITU-T H.222.0 (2006) | ISO/IEC 13818-1:2007, Information technology Generic coding of moving pictures and associated audio information: Systems. ITU-T H.264 Recommendation ITU-T H.264 (2011), Advanced video coding for generic audiovisual services. ITU-T
25、 P.1201 Recommendation ITU-T P.1201 (2012), Parametric non-intrusive assessment of audiovisual media streaming quality. ITU-T P.1201.1 Recommendation ITU-T P.1201.1 (2012), Parametric non-intrusive assessment of audiovisual media streaming quality Lower resolution application area. IETF RFC 2250 IET
26、F RFC 2250 (1998), RTP Payload Format for MPEG1/MPEG2 Video. IETF RFC 3550 IETF RFC 3550 (2003), RTP: A Transport Protocol for Real-Time applications. IETF RFC 3640 IETF RFC 3640 (2003), RTP Payload Format for Transport of MPEG-4 Elementary Streams. IETF RFC 4184 IETF RFC 4184 (2005), RTP Payload Fo
27、rmat for AC-3 Audio. IETF RFC 4566 IETF RFC 4566 (2006), SDP: Session Description Protocol. IETF RFC 6184 IETF RFC 6184 (2011), RTP Payload Format for H.264 Video. ISO/IEC 13818-3 ISO/IEC 13818-3:1998, Information technology Generic coding of moving pictures and associated audio information Part 3:
28、Audio. _ 1This Recommendation contains an electronic attachment containing a set of test vectors and test files for verification of compliance. 2 Rec. ITU-T P.1201.2 (10/2012) 3 Definitions None. 4 Abbreviations and acronyms This Recommendation uses the following abbreviations and acronyms: CTS Cont
29、inuous Timestamp DTS Decoding Timestamp ES Elementary Stream GOP Group of Pictures HD High Definition (television) HR Higher Resolution HBR High Bit Rate IP Internet Protocol IPTV Internet Protocol Television LR Lower Resolution MOS Mean Opinion Score MPEG Motion Pictures Expert Group NCTS Non-conti
30、nuous Timestamp NTSC National Television Standard Committee PAL Phase Alternating Line PAT Program Association Table PCR Program Clock Reference PES Packetized Elementary Stream PMT Program Map Table PTS Presentation Timestamp QoE Quality of Experience RTP Real Time Protocol TS Transport Stream UDP
31、User Datagram Protocol 5 Conventions None. 6 ITU-T P.1201.2 (HR) algorithm description 6.0 Introduction The following description provides text, formulae and pseudocode to explain the ITU-T P.1201.2 algorithm. Rec. ITU-T P.1201.2 (10/2012) 3 The algorithm estimates three mean opinion score (MOS) val
32、ues for audio quality, video quality and audiovisual quality, considering a measurement window of a defined length of 10 seconds. The algorithm derives the MOS by analysing the bitstream in four stages. This analysis is carried out one step after the other covering the entire 10-second interval. Eac
33、h stage can be seen as a complete sub-process in the sense that it receives input values and generates output values. The stages are defined as follows. Packet and global info acquisition: At stage 1, the data for a predefined time window of audio and video is collected, and the relevant basic data
34、is extracted. Header-based information is pre-shaped for the use in subsequent steps. In addition, side information is gathered from the available sources (see ITU-T P.1201 and detailed text below) or from the session description protocol (SDP). Frame-based parameter extraction: At stage 2, the sequ
35、ence of audio and video frames is extracted from the packet headers. The beginning of the data extraction for video is aligned with the appearance of an I-frame. For audio, capturing starts with the appearance of the first packetized elementary stream structure after that I-frame. Quality-relevant f
36、eatures, such as frame types, bit rate and packet loss information are derived from the extracted information. In case that the content is encrypted or the sampled sequence contains packet loss, it may not be possible to derive all data elements exactly, and hence the values for missing data element
37、s are estimated. Since the measurement window needs to comprise an integer number of group of pictures (GOPs), capturing stops at the end of the GOP after the 10-second measurement window has been completed. Sequence-based parameter extraction: At stage 3, parameters are extracted that relate to the
38、 sequence of frames comprising the measurement interval. The frame types such as I-, P-, B-frames are detected, and the GOP structure is estimated. Depending on the level of encryption and whether the real time protocol (RTP) timestamp is continuous or not, more of the required information has to be
39、 estimated, since it cannot be read directly from the bitstream. Also, the number of lost video frames is determined and the lost packets are allocated to the corresponding video frames. Similarly, the lost audio packets are allocated to the corresponding audio packetized elementary stream (PES) fra
40、mes. MOS estimation: Stage 4 finally combines the extracted quality-relevant features and provides the estimated MOS values. NOTE The bitstream may either have a pure RTP format, where the audio and video parts appear in separate RTP bitstreams, or audio and video can be multiplexed in an MPEG-2 TS.
41、 The MPEG-2 TS then may be packetized into an RTP bitstream, or it may be encapsulated in MPEG-2 TS/UDP (i.e., without RTP headers). The quality assessment algorithms and models described in this Recommendation have only been evaluated for MPEG-2 TS/RTP/UDP bitstreams. In addition, the bitstreams ma
42、y be encrypted or unencrypted. The MPEG-2 transport streams (TSs) may be encrypted either at the level of packetized elementary streams (PES-Encr), or at the level of the TS (TS-Encr). In addition, the RTP timestamp is handled in varying forms. In case of TS over RTP, the RTP timestamp may be non-co
43、ntinuous or continuous (below called NCTS or CTS). If it is non-continuous, the RTP timestamp only changes with the video frame. If the RTP timestamp changes continuously, then the timestamp increases proportionally to the elapsed time, in every new RTP packet. These different cases, as well as the
44、case without RTP, need to be considered during the extraction of parameters. In case of packet loss and even more in combination with encryption, some of the needed information cannot be derived in an exact manner and needs to be estimated, as described in detail below. Figure 1 illustrates the over
45、all algorithm of the ITU-T P.1201.2 model in the form of a block diagram. The following clauses describe the processing in the four stages of this algorithm in more detail. 4 Rec. ITU-T P.1201.2 (10/2012) P.1201.2(12)_F01BitstreamGlobal infoPacketacquisition(6.1)Frame parameterextraction(6.2)Sequenc
46、e parameterextraction(6.3)EstimateMOS(6.4)MOSAMOSVMOSAVGlobalinformation(6.1)Figure 1 Block diagram of the ITU-T P.1201.2 model 6.1 Packet acquisition and side information The packet acquisition stage receives as input a bitstream of IP packets, either read from a network interface or read from a fi
47、le (e.g., network traces in the form of PCAP files). The relevant information is extracted from the packets, with the packets being collected for one measurement window. Due to network degradations (e.g., packet loss, delay, or jitter), the received bitstream may be affected by packet losses or pack
48、et drops. Further, the payload of the packets might be encrypted, so that certain payload-related bitstream or header information is not available. In addition, some information may be available from other sources (e.g., transmitted out-of-band). This type of information is referred to as side infor
49、mation below. For each stage of processing, this Recommendation describes the respective input and output information, as well as the processing component itself. Input Side information file Destination port Audio frame duration or audio sample rate Concealment type Number of slices SDP Bitstream of IP packets Output “GlobalInfo“: “Video“: “Codec“ : , / H264/AVC “Resolution“ : , / SD, HD720, HD1080 “FrameRate“ : , “ConcealmentType“ : , / 0=Slicing-Mode, 1=Freezing-Mode