ATIS 0800061-2013 Methodology for Subjective or Objective Video Quality Assessment in Multiple Bit Rate Adaptive Streaming.pdf

资源描述

1、 ATIS-0800061 METHODOLOGY FOR SUBJECTIVE OR OBJECTIVE VIDEO QUALITY ASSESSMENT IN MULTIPLE BIT RATE ADAPTIVE STREAMING As a leading technology and solutions development organization, ATIS brings together the top global ICT companies to advance the industrys most-pressing business priorities. Through

2、 ATIS committees and forums, nearly 200 companies address cloud services, device solutions, emergency services, M2M communications, cyber security, ehealth, network evolution, quality of service, billing support, operations, and more. These priorities follow a fast-track development lifecycle from d

3、esign and innovation through solutions that include standards, specifications, requirements, business use cases, software toolkits, and interoperability testing. ATIS is accredited by the American National Standards Institute (ANSI). ATIS is the North American Organizational Partner for the 3rd Gene

4、ration Partnership Project (3GPP), a founding Partner of oneM2M, a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications sectors, and a member of the Inter-American Telecommunication Commission (CITEL). For more information, visit . Notice

5、of Disclaimer exchanging non-proprietary information to promote the development of video networking technology and foster resolution of issues common to the video services industry; and promoting interoperability by contributing to and supporting development of standards by national and internationa

6、l standards bodies. ATIS-0800061 iii Table of Contents 1 INTRODUCTION, SCOPE, PURPOSE, if these video quality metrics are used in the methodology, the best video quality is defined as the video quality that has the highest value of the video quality metric score. Some other objective video quality m

7、etrics define the Sminas the highest possible video quality. If these video quality metrics are used in the methodology, the best video quality is defined as the video quality that has the lowest value of the video quality metric score. For subjective video quality assessment, the best video quality

8、 is the video quality that has the subjective equivalents. Examples of subjective scores are: mean opinion score (MOS), absolute category rating score (ACR), and ITU-R BT-500-12 Recommendations. 3.1.2 Average Video Quality Average video quality is the video quality averaged over the entire test vide

9、o clip. 3.1.3 Instantaneous Video Quality Instantaneous video quality is the quality of each video frame or time tick in a test video clip. 3.2 Acronyms however, the methodology can be applied to any end-user device, such as tablets, laptops, televisions, and multimedia devices. Figure 1: Adaptive b

10、it rate streaming to mobile phones The core component of the system is the video transcoder sub-system. The input source video for the transcoder is high-resolution and high bit rate compressed video content; for example, High Definition video content encoded at 15Mbps. The compressed video source c

11、ontent is decoded, processed, and then re-encoded by the transcoder into multiple output bit-streams, each at a different bit rate. Typically, the transcoder uses different video resolutions and frame rates for each encoded bit rate. Each output video bit stream is packaged by the transcoder into a

12、container format, such as a transport stream or a fragmented MP4 format. The player on the mobile phone dynamically requests appropriate fragments (or chunks) of the transcoder output videos from the streaming server during a streaming session. The streaming server serves the requested chunks from s

13、elected transcoder output video to the mobile phone. In order to adapt to the network bandwidth variation in a streaming session, the mobile phone dynamically switches the requests for the chunks from different videos encoded at different bit rates. The important factors that influence the video qua

14、lity on a mobile phone are the three encoder parameters namely the video resolution, frame rate, and bit rate for each output video. Furthermore, the encoding resolutions may be smaller than the display resolution of the mobile phone; therefore, the spatial up-scaling performed at the mobile phone t

15、o stretch the video to the display screen size is also a major factor that affects the perceived video quality. The aim is to achieve the best VQ on the mobile phone due to the combined effect of all these factors for each received video fragment. This aim is achieved by first studying the effect of

16、 frame rate, bit rate, and spatial up-scaling on the video quality individually, and then measuring their combined effect on the video quality. ATIS-0800061 4 5 Methodology for Video Quality Analysis A methodology for VQ assessment is shown in Figure 2. It employs a full-reference VQ assessment mode

17、l. The full reference model measures the video quality by computing the VQ of a test video with respect to the reference video. Therefore, the resolution and frame rate (see 1 and 2) of the reference and test videos have to be identical for the full-reference VQ assessment. Video DecoderStorageDecod

18、eDeinterlace,Scale,Frame Rate convertEncode(Compressed)Source Video UncompressedYUV Encoded Multi Bit-rate BitstreamsFrame Rate upconversionSpatial Up-scaling to Display ResolutionCombined VQ SpatialUp-scaling to Display Resolution VQ due to Spatial ScalingVQ due to Bitrate (Compression)VQ due to Lo

19、wer Frame RateFrame-rate upconversion Uncompressed YUV Equal to Display ResolutionSmaller than Display ResolutionYUV Less than full frame rateFull frame rateTestReferenceTestReferenceFullFramerateTranscoderProcessing UnitSmallerthanDisplay Resolution at full frame rateTestEqual to Display Resolution

20、Full frame rateTestFigure 2: Objective video quality assessment methodology for adaptive streaming Due to low bit rate constraints on the cellular and wireless networks for video delivery to mobile phones, the transcoder generally encodes video at a smaller resolution as compared to the phones displ

21、ay resolution to achieve compression efficiency. The received video is then spatially up-scaled to the display resolution on the mobile phone using an internal image up-scaling module. For measurement of the objective VQ on a mobile phone, the ideal way to obtain the test video is by capturing the r

22、aw video output from the mobile phones video port such as HDMI output port. However, since not all mobile phones support such a video output port, the current methodology alternatively simulates the frame rate conversion and spatial up-scaling functions performed on the phone to generate equivalent

23、test videos, as shown in Figure 2. The spatial up-scaling and frame rate up-conversion modules are employed appropriately to make sure that the resolution and frame rate of both the reference video and test video are identical to satisfy full-reference VQ evaluation requirements. As shown in Figure

24、2, the spatial up-scaling module is used to simulate the up-scaling performed in the mobile phone to stretch the video to the phones full screen resolution. For example, a bi-linear interpolation technique 3 is used for up-scaling the video in the methodology. The frame rate up-conversion is used to

25、 simulate the way the human eye perceives the reduction in frame rate when the video is displayed on the phone. A simple and straight-forward frame rate up-conversion is achieved by using the frame duplication algorithm as described in Appendix A. However, a more sophisticated prediction-based frame

26、 interpolation algorithm that mimics the human visual systems perception of the reduced frame rate video can be designed and used in this methodology. ATIS-0800061 5 5.1 Methodology to Obtain Test (b) the compression (or bit rate); (c) the spatial up-scaling; and (d) the combined effect of lower fra

27、me rate, compression, and spatial up-scaling (see Figure 2). The procedure for obtaining the reference video and a test video as inputs to the VQ evaluation module in each of these four cases is described in this section. First, consider the determination of VQ due to lower video frame rate (see Fig

28、ure 2). In this case, the aim is to measure the video quality when a video having a certain resolution would be played on a mobile phone at a reduced frame rate, without performing any spatial up-scaling of the video to the phones display resolution. The reference video in this case is the output vi

29、deo of the preprocessing unit of the transcoder having the full frame rate (1, 2) of, say, 30fps and a resolution that is smaller than the phones display resolution. A corresponding test video is obtained in two steps: (1) select an output video of the preprocessing unit of the transcoder having the

30、 frame rate less than full frame rate and the resolution same as that of the reference video resolution; and (2) frame rate up-convert the selected video to full frame rate by using a frame rate up-conversion scheme. Thus, both the reference and test video have the same resolution and frame rate, bu

31、t the test video has more reduced temporal information than the reference video. Therefore, the VQ between the reference and test video is a measure of video quality of lower frame rate video due to temporal loss of information. Second, consider the determination of VQ due to video compression as sh

32、own in Figure 2. In this case, the aim is to measure the video quality when the video with compression artifacts would be played on a mobile phone without performing any spatial up-scaling of the video to the phones display resolution. The reference video in this case is the output video of the prep

33、rocessing unit of the transcoder having the full frame rate and a resolution that is smaller than the phones display resolution. The corresponding test video is obtained in two steps: (1) select a compressed video at a certain bit rate, having the full frame and the resolution same as that of the re

34、ference video resolution; and (2) decode the selected compressed video. Third, while studying the effect of spatial up-scaling on video quality, the goal would be to analyze what the video quality would be on the mobile phone when the reference video is up-scaled to the display resolution of the pho

35、ne. As shown in Figure 2, the reference video in this case is the output video of the preprocessing unit of the transcoder having the full frame rate of, say, 30fps and a resolution that is equal to the display resolution of the mobile phone. A test video is obtained in two steps: (1) select an outp

36、ut video of the preprocessing unit of the transcoder having the full frame rate and a resolution smaller than that of the reference video resolution; and (2) spatially up-scale the selected video to the reference video resolution, which is the same as the phones display resolution. Finally, consider

37、 the combined effect of reduced frame rate, compression, and spatial up-scaling on the VQ. As shown in Figure 2, the reference video in this case is the output video of the preprocessing unit of the transcoder having the full frame rate of, say, 30fps and a resolution that is equal to the display re

38、solution of the mobile phone. A test video is obtained in four sequential steps: (1) select a compressed video at a certain bit rate having the frame rate less than the full frame rate and the resolution smaller than the reference video resolution; (2) decode the selected compressed video; (3) frame

39、 rate up-convert the decoded video to the full frame rate by using a frame-rate up-conversion scheme; and (4) spatially up-scale the video to the reference video resolution, which is the same as the phones display resolution. To obtain a test video for the determination of the combined objective VQ

40、due to compression and spatial up-scaling only, a compressed video at a certain bit rate having the frame rate equal to the full frame rate and the resolution smaller than the reference video resolution should be selected in step 1; then step 3 is skipped and remaining steps are performed. 5.2 Video

41、 Quality Test Set-up 5.2.1 Source Video The methodology implementation should specify the complete transcoder input source video parameter set; for example: resolution, frame rate, bit rate, Group of Pictures (GOP) structure, codec profile and level, and other relevant tool-set details such as motio

42、n vector precision, number of reference frames, entropy coding, etc. The source video content should consist of several clearly identifiable segments of equal duration, with the content of each segment containing different types of motion and texture information. Example: A transport stream file con

43、taining 1080i High Definition video of 240 seconds duration is used as the ATIS-0800061 6 source video to the transcoder. The source video has the following properties: H.264 Main Profile at Level 4.0, resolution of 1920x1080, interlaced video, 59.94 fields per second, bit rate of 5 Mbps, IBBrefBPBB

44、refBP, GOP structure with 32-frame GOP size, Context-Adaptive Binary Arithmetic Coding (CABAC) entropy coding, and four reference frames. The source video is the concatenation of 12 video segments of 20 seconds each having different motion and texture content as shown in Figure 3. The description of

45、 each video segment is provided in Table 1. . Figure 3: Sample images in each of the 12 video segments, which are concatenated to form the example source video clip Table 1: Example Source Video Clip Content Description Source video clip segment Content description Segment #1 Large homogeneous stati

46、c areas, moderate motion. Segment #2 High texture and high motion content. Segment #3 High texture and high motion content, camera zoom. Segment #4, #5 Talking head and graphics, low texture and motion. Segment #6, #7 High texture and high motion, camera pan and tilt. Segment #8 High texture and hig

47、h motion. Segment #9, #10 Indoor scene, low contrast, low motion. Segment #11, #12 Sports scene, high motion. 5.2.2 Output Video The source video content is transcoded to multiple output files, with each output file containing a video bit stream encoded at a different bit rate. The methodology imple

48、mentation specifies the complete output video parameter set; for example, resolution, frame rate, bit rate, GOP structure, codec profile and level, and other relevant tool-set details such as motion vector precision, number of reference frames, entropy coding, etc. Square pixel aspect ratio should b

49、e used with the resolutions such that the pixel aspect ratio of the output is equal to the aspect ratio of the input video. Some of the output resolutions and formats may have a big impact on VQ relative to other resolutions and formats, whereas some other output resolutions and formats may have negligible impact on VQ relative to other resolutions and formats. Example: Fragmented-MP4 output files containing H.264 Baseline profile bitstreams having different resolution, frame rate, and bit rate are generated by the transcoder. Two sets of files are considered; one

展开阅读全文