ITU-T J 340-2010 Reference algorithm for computing peak signal to noise ratio of a processed video sequence with compensation for constant spatial shifts constant temporal shift anro.pdf

资源描述

1、 International Telecommunication Union ITU-T J.340TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (06/2010) SERIES J: CABLE NETWORKS AND TRANSMISSION OF TELEVISION, SOUND PROGRAMME AND OTHER MULTIMEDIA SIGNALS Measurement of the quality of service Reference algorithm for computing peak signal to noi

2、se ratio of a processed video sequence with compensation for constant spatial shifts, constant temporal shift, and constant luminance gain and offset Recommendation ITU-T J.340 Rec. ITU-T J.340 (06/2010) i Recommendation ITU-T J.340 Reference algorithm for computing peak signal to noise ratio of a p

3、rocessed video sequence with compensation for constant spatial shifts, constant temporal shift, and constant luminance gain and offset Summary Peak signal to noise ratio (PSNR) is a useful benchmark for evaluating performance improvements of new objective perceptual video quality metrics. This PSNR

4、calculation method in Recommendation ITU-T J.340 has the advantage of automatically determining the highest possible PSNR value for a given video sequence over the range of spatial and temporal shifts. Only one temporal shift is allowed for all frames in the entire processed video sequence (i.e., co

5、nstant delay). This Recommendation defines a full reference (FR) algorithm for computing both the calibration and PSNR estimations for a processed video sequence: peak signal to noise ratio with compensation for constant spatial shifts, constant temporal shift, and constant luminance gain and offset

6、 (PSNRconst). Since the PSNRconstalgorithm only examines the Y luminance channel (as defined by Recommendation ITU-R BT.601-6) distortions in the CB and CR chrominance channels will not be detected by the algorithm of this Recommendation. The intent of this Recommendation is to define and facilitate

7、 a standardized PSNR metric for use by industry and standard organizations. Reference code and test vectors have been included to assure accurate and consistent implementation of this PSNRconstmetric. History Edition Recommendation Approval Study Group 1.0 ITU-T J.340 2010-06-29 9 ii Rec. ITU-T J.34

8、0 (06/2010) FOREWORD The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications, information and communication technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsi

9、ble for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-

10、T study groups which, in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative

11、basis with ISO and IEC. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandat

12、ory provisions (to ensure, e.g., interoperability or applicability) and compliance with the Recommendation is achieved when all of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The

13、 use of such words does not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes

14、 no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had received notice of intellectual property, p

15、rotected by patents, which may be required to implement this Recommendation. However, implementers are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2011 All rights reserved. No pa

16、rt of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU. Rec. ITU-T J.340 (06/2010) iii CONTENTS Page 1 Scope 1 2 References. 2 3 Definitions 2 4 Abbreviations and acronyms 2 5 Conventions 2 6 PSNRconstAlgorithm Description . 2 6.1 Introduction

17、2 6.2 Algorithm . 3 Appendix I Reference Code to Calculate PSNR . 5 Bibliography. 20 Rec. ITU-T J.340 (06/2010) 1 Recommendation ITU-T J.340 Reference algorithm for computing peak signal to noise ratio of a processed video sequence with compensation for constant spatial shifts, constant temporal shi

18、ft, and constant luminance gain and offset 1 Scope Peak signal to noise ratio (PSNR) is a useful benchmark for evaluating performance improvements of new objective perceptual video quality metrics. For example, PSNR has been used as a benchmark for both the multimedia (MM) and reduced reference tele

19、vision (RRTV) test programs recently completed by the video quality experts group (VQEG). Since the calculation of PSNR is highly dependent upon proper estimation of spatial alignment, temporal alignment, gain, and level offset between the processed video sequence and the original video sequence, th

20、e method of measurement for PSNR should ideally include a method for performing these calibration procedures. This PSNR calculation method in this Recommendation has the advantage of automatically determining the highest possible PSNR value for a given video sequence over the range of spatial and te

21、mporal shifts. Only one temporal shift is allowed for all frames in the entire processed video sequence (i.e., constant delay). This Recommendation defines a full reference (FR) algorithm for computing both the calibration and PSNR estimations for a processed video sequence: peak signal to noise rat

22、io with compensation for constant spatial shifts, constant temporal shift, and constant luminance gain and offset (PSNRconst). Since the PSNRconstalgorithm only examines the Y luminance channel (as defined by b-ITU-R BT.601-6) distortions in the CB and CR chrominance channels will not be detected by

23、 the algorithm of this Recommendation. The intent of this Recommendation is to define and facilitate a standardized PSNR metric for use by industry and standards organizations. Reference code and test vectors have been included to assure accurate and consistent implementation of this PSNRconstmetric

24、. The intention of this PSNRconst metric is to fully calibrate the video and then calculate PSNR on the luminance plane only. For these purposes, calibration consists of selecting the valid video region spatially (e.g., discarding the overscan region) and then removing from the entire video sequence

25、 a constant temporal shift (delay or advance), a constant spatial shift (vertically and horizontally), and a constant luminance gain and offset. Common applications of the PSNRconstalgorithm include: A reference benchmark for evaluating the effectiveness of perceptual quality metrics. A FR quality o

26、f service metric (QoS) for video transmission systems. It should be noted that the use of this calibration method will result in the best possible PSNRconstvalue for a video sequence with constant delay. This value may differ from PSNRconstobtained with perfectly calibrated video. Degradations due t

27、o gain are removed by this method. If higher PSNRconstis obtained using an alternative calibration method, then this alternative calibration method can be used, although differences are expected to be small. It is noted that this Recommendation considers only integer-pixel shifts. It should be noted

28、 that this PSNRconstalgorithm will not detect as an impairment a constant spatial shift, a constant luminance gain, a constant luminance offset, or a constant temporal shift. 2 References None. 2 Rec. ITU-T J.340 (06/2010) 3 Definitions None. 4 Abbreviations and acronyms This Recommendation uses the

29、 following abbreviations and acronyms: FR Full Reference MSE Mean Squared Error PSNR Peak Signal to Noise Ratio PSNRconstPeak Signal to Noise Ratio with Compensation for Constant Spatial Shifts, Constant Temporal Shift, and Constant Luminance Gain and Offset QoS Quality of Service SROI Spatial Regio

30、n of Interest ST Spatial-Temporal TROI Temporal Region of Interest 5 Conventions None. 6 PSNRconstalgorithm description 6.1 Introduction The PSNRconstalgorithm performs an exhaustive search for the maximum Y-channel PSNRconstover plus or minus the horizontal and vertical spatial uncertainties (in pi

31、xels) and plus or minus the temporal uncertainty (in frames). The processed video segment is fixed and the original video segment is shifted over the search range. For each spatial-temporal (ST) shift, a linear fit between the Y channel processed and the original pixels is performed such that the me

32、an squared error (MSE) of original (gain*processed + offset) is minimized, hence maximizing PSNR. Thus, this calculation of PSNRconstyields values that are greater than or equal to commonly used PSNR implementations if the exhaustive search covered enough ST shifts. The ST search range, spatial regi

33、on of interest (SROI), and temporal region of interest (TROI) are input parameters to the algorithm and the user of this Recommendation is advised to carefully consider appropriate input arguments for these quantities. For instance, some video systems truncate the border picture elements replacing t

34、hese pixels with black. In these cases, the SROI should be reduced so as to not include these pixels in the PSNRconstcalculation. Since the ST search performed by the algorithm only considers integer shifts in space and time, this method is not appropriate for video systems that contain sub-pixel sh

35、ifts. Caution should also be observed when analysing video frames that contain interlaced video frames to assure that the interlaced framing of the processed video sequence is identical to the original video sequence before applying the algorithm (i.e., the algorithm does not account for one-field o

36、r half frame timing shifts between the processed and original). Since the video delay is assumed to be constant for all video frames in the processed video segment, this algorithm may not be appropriate for video systems that contain variable video delays (i.e., where the video delay of individual f

37、rames may vary). The algorithm may be used in these cases if one desires the PSNRconstcalculation to include distortions due to variable video delays. Rec. ITU-T J.340 (06/2010) 3 6.2 Algorithm PSNR is defined as 10*log10of the ratio of the peak signal energy to the MSE observed between the processe

38、d video signal and the original video signal. For the algorithm presented here, the peak signal energy is assumed to be (2R 1)2where R is the pixel depth, and the MSE summation is performed over the selected SROI and TROI of the processed video sequence. It is noted that the peak signal energy is 25

39、52(i.e., 255*255) for 8-bit video and 10232for 10-bit video. The algorithm performs a linear fit of the processed image pixels to the corresponding original image pixels for each ST shift that is examined before computing the MSE. This is equivalent to removing gain (contrast) and level offset (brig

40、htness) calibration errors in the processed video before performing the PSNRconstcalculation. Computation of PSNRconstfor an (original, processed) video clip pair involves the following steps: 1) Determine the appropriate ST search range for the processed video clip. This involves estimating the x a

41、nd y spatial uncertainty (in pixels, denoted here as x_uncert and y_uncert) of spatial registration errors that might be present, as well as estimating the t temporal uncertainty (in frames, denoted as t_uncert) of any temporal registration errors that might be present. Since the algorithm will perf

42、orm an exhaustive search over plus or minus x_uncert, y_uncert, and t_uncert shifts of the original video sequence with respect to the processed video sequence, these estimates should be as tight as possible while still including the optimal ST registration. 2) Determine the maximum SROI and TROI th

43、at can be used for the processed video clip. If the processed video clip contains truncation of border pixels (i.e., black border), the SROI of the processed video clip should be reduced to eliminate these pixels. If the processed video clip contains transition video frames at the beginning or end o

44、f the video clip (perhaps due to prior or following scene content), these frames should be eliminated from the TROI. If necessary, reduce the maximum SROI and TROI to allow for the x_uncert, y_uncert, and t_uncert shifts found in step 1. The final SROI and TROI of the processed video clip that is de

45、termined by this step will remain fixed for all PSNR calculations. Since the original video clip will be shifted by a maximum of plus or minus x_uncert and y_uncert pixels and plus or minus t_uncert frames with respect to the processed video clip, one must assure that there are valid original video

46、pixels that align to every processed video pixel within the final SROI and TROI. 3) For each ST shift of the original sequence in step 1 (i.e., shifts in the x, y and t directions will be denoted here as xS, ySand tS, respectively), perform a linear fit of the processed pixels to the shifted origina

47、l pixels. This linear fit is performed for all pixels in the entire ST region encompassed by the processed video SROI and TROI selected in step 2. For a given ST shift, this can be expressed as finding the Gain(xS,yS,tS) and Offset(xS,yS,tS) that minimizes the MSE given by: TROI,SROI),(;),(),(*),(),

48、(1MSE2+=tyxtyxOffsettyxPtyxGainttyyxxONxsssyssstsss4 Rec. ITU-T J.340 (06/2010) where three dimensional matrices O and P represent the original and processed video sequences, respectively, the MSE is computed over all x, y, and t that belong to SROI and TROI, and N is the total number of pixels in t

49、he three dimensional processed video segment encompassed by SROI and TROI. It is noted that this algorithm considers only integer-pixel shifts. 4) Compute the MSE in step 3 for all ST shifts within the spatial and temporal uncertainties defined in step 1 (i.e., x_uncert xS x_uncert, y_uncert yS y_uncert, and t_uncert tS t_uncert) and select the minimum MSE (i.e., MSEmin). This is the MSE that will maximize the PSNR, defined by: =min210constMSE)12(log*10PSNRRwhere R is the pixel bit depth. For example, R is 8 for 8-bit video (thus numer

展开阅读全文