ITU-R BT 1683-2004 Objective perceptual video quality measurement techniques for standard definition digital broadcast television in the presence of a full reference《在有全部参照的情况下标准定义.pdf

资源描述

1、 Rec. ITU-R BT.1683 1 RECOMMENDATION ITU-R BT.1683 Objective perceptual video quality measurement techniques for standard definition digital broadcast television in the presence of a full reference (Question ITU-R 44/6) (2004) The ITU Radiocommunication Assembly, considering a) that the ability to m

2、easure automatically the quality of broadcast video has long been recognized as a valuable asset to the industry; b) that conventional objective methods are no longer fully adequate for measuring the perceived video quality of digital video systems using compression; c) that objective measurement of

3、 perceived video quality will complement conventional objective test methods; d) that current formal subjective assessment methods are time-consuming and expensive and generally not suited for operational conditions; e) that objective measurement of perceived video quality may usefully complement su

4、bjective assessment methods, recommends 1 that the guidelines, scope and limitations given in Annex 1 be used in the application of the objective video quality models found in Annexes 2-5; 2 that the objective video quality models given in Annexes 2-5 be used for objective measurement of perceived v

5、ideo quality. Annex 1 Summary This Recommendation specifies methods for estimating the perceived video quality of a one-way video transmission system. This Recommendation applies to baseband signals. The estimation methods in this Recommendation are applicable to: codec evaluation, specification, an

6、d acceptance testing; potentially real-time, in-service quality monitoring at the source; remote destination quality monitoring when a copy of the source is available; quality measurement of a storage or transmission system that utilizes video compression and decompression techniques, either a singl

7、e pass or a concatenation of such techniques. Introduction The ability to measure automatically the quality of broadcast video has long been recognized as a valuable asset to the industry. The broadcast industry requires such tools to replace or supplement costly and time-consuming subjective qualit

8、y testing. Traditionally, objective quality measurement has been obtained by calculating peak signal-to-noise ratios (PSNRs). Although a useful indicator 2 Rec. ITU-R BT.1683 of quality, PSNR has been shown to be a less than satisfactory representation of perceptual quality. To overcome the limitati

9、ons associated with PSNR, research has been directed towards defining algorithms that can measure the perceptual quality of broadcast video. Such objective perceptual quality measurement tools may be applied to testing the performance of a broadcast network, as equipment procurement aids and in the

10、development of new broadcast video coding techniques. In recent years, significant work has been dedicated to the development of reliable and accurate tools that can be used to objectively measure the perceptual quality of broadcast video. This Recommendation defines objective computational models t

11、hat have been shown to be superior to PSNR as automatic measurement tools for assessing the quality of broadcast video. The models were tested on 525-line and 625-line material conforming to Recommendation ITU-R BT.601, which was characteristic of secondary distribution of digitally encoded televisi

12、on quality video. The performance of the perceptual quality models was assessed through two parallel evaluations of the test video material1. In the first evaluation, a standard subjective method, the double stimulus continuous quality scale (DSCQS) method, was used to obtain subjective ratings of q

13、uality of video material by panels of human observers (Recommentdation ITU-R BT.500 Methodology for the subjective assessment of the quality of television pictures). In the second evaluation, objective ratings were obtained by the objective computational models. For each model, several metrics were

14、computed to measure the accuracy and consistency with which the objective ratings predicted the subjective ratings. Three independent laboratories conducted the subjective evaluation portion of the test. Two laboratories, Communications Research Center (CRC, Canada) and Verizon (United States of Ame

15、rica), performed the test with 525/60 Hz sequences and a third lab, Fondazione Ugo Bordoni (FUB, Italy), performed the test with 625/50 Hz sequences. Several laboratories “proponents” produced objective computational models of the video quality of the same video sequences tested with human observers

16、 by CRC, Verizon and FUB. The results of the tests are given in Appendix 1. This Recommendation includes the objective computational models shown in Table 1. TABLE 1 Model number Name Video Quality Experts Group (VQEG) proponent Country Annex 1 British Telecom D United Kingdom 2 2 Yonsei University/

17、Radio Research Laboratory/SK Telecom E Korea (Rep. of) 3 3 Center for Telecommunications Research and Development (CPqD) F Brazil 4 4 National Telecommunications and Information Administration/Institute for Telecommunication Science (NTIA/ITS) H United States of America 5 1ITU-R Doc. 6Q/14 September

18、 2003 Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, Phase II (FR-TV2). Rec. ITU-R BT.1683 3 A complete description of the above four objective computational models is provided in Annexes 2-5. Existing video quality test equipmen

19、t can be used until new test equipment implementing any of the four above models is readily available. For any model to be considered for inclusion in the normative section of this Recommendation in the future, the model must be verified by an open independent body (such as VQEG) which will do the t

20、echnical evaluation within the guidelines and performance criteria set out by Radiocommunication Study Group 6. The intention of Radiocommunication Study Group 6 is to eventually recommend only one normative full reference method. 1 Scope This Recommendation specifies methods for estimating the perc

21、eived video quality of a one-way video system. This Recommendation applies to baseband signals. The objective video performance estimators are defined for the end-to-end quality between the two points. The estimation methods are based on processing 8-bit digital component video as defined by Recomme

22、ndation ITU-R BT.6012. The encoder can utilize various compression methods (e.g. Moving Picture Experts Group (MPEG), ITU-T Recommendation H.263, etc.). The models proposed in this Recommendation may be used to evaluate a codec (encoder/decoder combination) or a concatenation of various compression

23、methods and memory storage devices. While the derivation of the objective quality estimators described in this Recommendation might have considered error impairments (e.g. bit errors, dropped packets), independent testing results are not currently available to validate the use of the estimators for

24、systems with error impairments. The validation test material did not contain channel errors. 1.1 Application This Recommendation provides video quality estimations for television video classes (TV0-TV3), and multimedia video class (MM4) as defined in ITU-T Recommendation P.911, Annex B. The applicat

25、ions for the estimation models described in this Recommendation include but are not limited to: codec evaluation, specification, and acceptance testing, consistent with the limited accuracy as described below; potentially real-time, in-service quality monitoring at the source; remote destination qua

26、lity monitoring when a copy of the source is available; quality measurement of a storage or transmission system that utilizes video compression and decompression techniques, either a single pass or a concatenation of such techniques. 1.2 Limitations The estimation models described in this Recommenda

27、tion cannot be used to replace subjective testing. Correlation values between two carefully designed and executed subjective tests (i.e. in two different laboratories) normally fall within the range 0.92 to 0.97. This Recommendation does not 2This does not preclude implementation of the measurement

28、method for one-way video systems that utilize composite video input and outputs. Specification of the conversion between composite and component domains is not part of this Recommendation. For example, SMPTE 170M Standard specifies one method for performing this conversion for NTSC. 4 Rec. ITU-R BT.

29、1683 supply a means for quantifying potential estimation errors. Users of this Recommendation should review the comparison of available subjective and objective results to gain an understanding of the range of video quality rating estimation errors. The predicted performance of the estimation models

30、 is not currently validated for video systems with transmission channel error impairments. Annex 2 Model 1 CONTENTS Page 1 Introduction 5 2 BTFR 5 3 Detectors. 5 3.1 Input conversion . 6 3.2 Crop and offset . 6 3.3 Matching. 7 3.3.1 Matching statistics 9 3.3.2 MPSNR 9 3.3.3 Matching Vectors. 9 3.4 S

31、patial frequency analysis 10 3.4.1 Pyramid transform 10 3.4.2 Pyramid SNR . 12 3.5 Texture analysis 12 3.6 Edge analysis 13 3.6.1 Edge detection 13 3.6.2 Edge differencing. 13 3.7 MPSNR analysis. 14 4 Integration. 14 5 Registration. 15 6 References 15 Annex 2a 16 Rec. ITU-R BT.1683 5 1 Introduction

32、The BT full-reference (BTFR) automatic video quality assessment tool produces predictions of video quality that are representative of human quality judgements. This objective measurement tool digitally simulates features of the human visual system (HVS) to give accurate predictions of video quality

33、and offers a viable alternative to costly and time-consuming formal subjective assessments. A software implementation of the model was entered in the VQEG2 tests and the resulting performance presented in a test report. 2 BTFR The BTFR algorithm consists of detection followed by integration as shown

34、 in Fig. 1. Detection involves the calculation of a set of perceptually meaningful detector parameters from the undistorted (reference) and distorted (degraded) video sequences. These parameters are then input to the integrator, which produces an estimate of the perceived video quality by appropriat

35、e weighting. The choice of detectors and weighting factors are founded on knowledge of the spatial and temporal masking properties of the HVS and determined through calibration experiments. 1683-01Detectors IntegrationPredicted videoqualityReference videoDegraded videoTextureDegPySNR(3,3)EDiffXPerCe

36、ntMPSNRSegVPSNRFIGURE 1Full-reference video quality assessment modelMPSNR: matched PSNRInput video of types 625 (720 576) interlaced at 50 fields/s and 525 (720 486) interlaced at 59.94 fields/s in YUV422 format are supported by the model. 3 Detectors The detection module of the BTFR algorithm calcu

37、lates a number spatial, temporal and frequency-based measures from the input YUV formatted sequences, as shown in Fig. 2. 6 Rec. ITU-R BT.1683 1683-02Crop andoffsetMatchingMatched reference(Mref)Degraded (DegYUVField(x,y)DegradedvideoReferencevideoOffset X, Offset YSpatialfrequencyanalysisYUVPSNRana

38、lysisEdgedetectionTextureanalysisTextureDegEDifSegVPSNRPySNR (3,3)XperCentMPSNRFIGURE 2Detection3.1 Input conversion First, the input sequences are converted from YUV422 interlaced format to a block YUV444 deinterlaced format so that each successive field is represented by arrays RefY, RefU and RefV

39、 1.0,1.0),( = YyXxyxYRef (1) 1.0,1.0),( = YyXxyxUfRe (2) 1.0,1.0),( = YyXxyxVRef (3) where: X : number of horizontal pixels within a field Y : number of vertical pixels. For a YUV422 input, each U and V value must be repeated to give the full resolution arrays (2) and (3). 3.2 Crop and offset This

40、routine crops with offset the degraded input sequence and crops without offset the reference input sequence. The offset parameters XOffset and YOffset are determined externally and define the number of pixels horizontal and vertical that the degraded sequence is offset from the reference. The pictur

41、e origin is defined as being in the top left hand corner of the image, with a positive horizontal increment moving right and a positive vertical increment moving down the picture. A value of XOffset = 2 indicates that the degraded fields are offset to the right by 2 pixels and a value of YOffset = 2

42、 indicates an offset down of 2 pixels. For an input field with YUV values stored in YUV444 format (see 3.1) in arrays InYField, InUField, and InVField the cropped and offset output is calculated according to (4) to (20). XOffsetXStart = (4) xxCXStartCXStart =xxCXXEndCXXEnd (7) YOffsetYStart = (8) yy

43、CYStartCYStart =yyCYYEndCYYEnd (11) X and Y give the horizontal and vertical field dimensions respectively and Cxand Cy the number of pixels to be cropped from left and right and top and bottom. For 625 sequences, 10,30,288,720 =yxCCYX (12) For 525 sequences, 10,30,243,720 =yxCCYX (13) Xstart, Xend,

44、 Ystart and Yend now define the region of each field that will be copied. Pixels outside this region are initialized according to equations (14) to (15), where YField, UField and VField are XxY output pixel arrays containing Y, U and V values respectively. The vertical bars to the left and right of

45、the field are initialized according to: 1.01.1,1.00),( =+= YyXXEndXStartxyxYField (14) 1.01.1,1.0128),(),( =+= YyXXEndXStartxyxVFieldyxUField (15) The horizontal bars at the top and bottom of the field are initialized according to: 1.1,1.0,.0),( += YYEndYStartyXEndXStartxyxYField (16) 1.1,1.0.128),(

46、),( += YYEndYStartyXEndXStartxyxVFieldyxUField (17) Finally, the pixel values are copied according to: YEndYStartyXEndXStartxYOffsetyXOffsetxInYFieldyxYField ),(),( =+= (18) YEndYStartyXEndXStartxYOffsetyXOffsetxInUFieldyxUField ),(),( =+= (19) YEndYStartyXEndXStartxYOffsetyXOffsetxInVFieldyxVField

47、),(),( =+= (20) For the degraded input, cropping and shifting produces output field arrays DegYField, DegUField and DegVField, whilst cropping without shifting for the reference sequence produces RefYField, RefUField and RefVfield. These XxY two-dimensional arrays are used as inputs to detection rou

48、tines described below. 3.3 Matching The matching process produces signals for use within other detection procedures and also detection parameters for use in the integration procedure. The matching signals are generated from a process of finding the best match for small blocks within each degraded fi

49、eld from a buffer of neighbouring reference fields. This process yields a sequence, the matched reference, for use in place of the reference sequence in some of the detection modules. 8 Rec. ITU-R BT.1683 The matching analysis is performed on 9 9 pixel blocks of the intensity arrays RefYField and DegYField. Adding a field number dimension to the intensity arrays, pixel (Px, Py) of the reference field N can be represented as: NPyPxRefYFieldPyPxNRef fieldfrom),(),( = (21) A 9 9 pixel block with centre pixel (Px,Py) within t

展开阅读全文