1、 AMERICAN NATIONAL STANDARD FOR TELECOMMUNICATIONS ATIS-0100017.2008(R2013) Video Calibration for Reduced Reference Objective Video Quality As a leading technology and solutions development organization, ATIS brings together the top global ICT companies to advance the industrys most-pressing busines
2、s priorities. Through ATIS committees and forums, nearly 200 companies address cloud services, device solutions, emergency services, M2M communications, cyber security, ehealth, network evolution, quality of service, billing support, operations, and more. These priorities follow a fast-track develop
3、ment lifecycle from design and innovation through solutions that include standards, specifications, requirements, business use cases, software toolkits, and interoperability testing. ATIS is accredited by the American National Standards Institute (ANSI). ATIS is the North American Organizational Par
4、tner for the 3rd Generation Partnership Project (3GPP), a founding Partner of oneM2M, a member and major U.S. contributor to the International Telecommunication Union (ITU) Radio and Telecommunications sectors, and a member of the Inter-American Telecommunication Commission (CITEL). For more informa
5、tion, visit. AMERICAN NATIONAL STANDARD Approval of an American National Standard requires review by ANSI that the requirements for due process, consensus, and other criteria for approval have been met by the standards developer. Consensus is established when, in the judgment of the ANSI Board of St
6、andards Review, substantial agreement has been reached by directly and materially affected interests. Substantial agreement means much more than a simple majority, but not necessarily unanimity. Consensus requires that all views and objections be considered, and that a concerted effort be made towar
7、ds their resolution. The use of American National Standards is completely voluntary; their existence does not in any respect preclude anyone, whether he has approved the standards or not, from manufacturing, marketing, purchasing, or using products, processes, or procedures not conforming to the sta
8、ndards. The American National Standards Institute does not develop standards and will in no circumstances give an interpretation of any American National Standard. Moreover, no person shall have the right or authority to issue an interpretation of an American National Standard in the name of the Ame
9、rican National Standards Institute. Requests for interpretations should be addressed to the secretariat or sponsor whose name appears on the title page of this standard. CAUTION NOTICE: This American National Standard may be revised or withdrawn at any time. The procedures of the American National S
10、tandards Institute require that action be taken periodically to reaffirm, revise, or withdraw this standard. Purchasers of American National Standards may receive current information on all standards by calling or writing the American National Standards Institute. Notice of Disclaimer and one featur
11、e may produce good temporal alignments where another one fails. The magnitude and shape of the correlation function are used as indicators for the reliability of that features temporal registration. Unreliable features are discarded, and the remaining results are considered jointly to estimate the b
12、est average video delay for the clip. In the following feature definitions, the luminance image is noted as Y. For interlaced video, Y is a field and for progressive video, Y is a frame. The time when this field or frame occurs is denoted as t. Pixels of Y are further subscripted by row and column,
13、i and j, respectively, so that an individual pixel is denoted as Y(i,j,t). ATIS-0100017.2008 5 5.2.1 TI2 Feature: Two Field Difference Temporal Information For interlaced video, to compute TI2 at time t, consider field Y(t) and the previous field of the same type Y(t-2), and compute for each pixel )
14、2,Y,Y,TI2 tjitjitji (1) For progressive video, consider frame Y(t) and the previous frame Y(t-1), and compute for each pixel )1,Y,Y,TI2 tjitjitji (2) Then, using the results from (1) or (2), compute ijtjitjirmst2space,TI2R1),(TI2TI2 (3) where rmsspaceis the root mean square function over space defin
15、ed by the above equation, i and j are within the valid region defined in section 2.1, and R is the total number of pixels in the valid region (i.e., the number of pixels in the double summation). The TI2 calculation for interlaced fields is calculated as shown in Figure 1. Field 2 t=5 Field 1 t=4 Fi
16、eld 2 t=3 Field 1 t=2 Field 2 t=1 Field 1 t=8 Field 2 t=7 Field 1 t=6 Field 1 t=0 Field 2 t=9 rms rms rms rms rms rms rms rms Figure 1- Diagram depicting the method of calculating TI2(t). ATIS-0100017.2008 6 5.2.2 TI10 Feature: Ten Field Difference Temporal Information The TI10 feature is based on a
17、 temporal difference spaced ten interlaced fields apart, or five progressive frames apart. This feature smoothes the temporal information using a wider filter than TI2 and eliminates the effect of frame repeats in the TI waveform for systems that have four or fewer consecutive frame repeats. To comp
18、ute TI10 at time t on interlaced sequences, consider field Y(t) and the field of the same type five frames ago Y(t-10), and compute for each pixel TI10(i,j,t) = Y(i,j,t) - Y(i,j,t-10) (4) For progressive sequences, consider frame Y(t) and the frame five frames ago Y(t-5), and compute for each pixel
19、TI10(i,j,t) = Y(i,j,t) - Y(i,j,t-5) (5) Then, using the results from (4) or (5), compute TI10(t) = rmsspaceTI10(i,j,t) (6) where i and j are within the valid region defined in section 2.1. 5.2.3 Ymean Feature: Average Luminance Level Ymean is calculated as the average luminance level of a field. To
20、compute Ymean at time t on interlaced fields or frames, consider Y(t) and compute ijtjiYtjimeant ,R1,YYmeanspace(7) where meanspaceis the mean function over space defined by the above equation, i and j are within the valid region defined in section 2.1, and R is the total number of pixels in this va
21、lid region (i.e., the number of pixels in the double summation). 5.3 Feature Sequence Cross-Correlation5and Validation The following steps describe the process that is used to cross-correlate original and processed feature streams in order to estimate their best temporal registration. The algorithm
22、includes validation steps, where the features are examined to discard potentially unreliable alignment results. The original feature stream will be referred to as aoand the processed feature stream as ap. For example, when using the two field difference temporal information feature, a(t) = TI2(t). L
23、et M be the length of the feature streams, aoand ap. The feature stream ao, given by ao(0), ao(1), ao(2), , ao(M-1) will be abbreviated as 1M0o )(att . 5The correlation function described in this section is based on minimization of the standard deviation of the difference between the original and pr
24、ocessed feature streams, not the maximization of the energy in the cross-product of the two feature streams. ATIS-0100017.2008 7 We have found that temporal registration estimates based on the above features become unreliable when thresholdtstd otimea (8) or thresholdtstd Ptimea (9) where threshold
25、= Y_THRESHOLD = 0.25 for the Ymean feature, threshold = TI_THRESHOLD = 0.15 for the TI2 and TI10 features, and stdtimerepresents the standard deviation over time of the M samples in the feature stream. When (8) or (9) are satisfied, the Ymean waveform has detected insufficient temporal changes in sc
26、ene brightness to be useful for temporal registration (e.g., scene with a constant brightness level). Similarly when (8) or (9) are satisfied the TI2 and TI10 waveforms have detected insufficient temporal changes in the scene motion to be useful for temporal registration (e.g., still scene). Feature
27、s that fall below the thresholds in (8) or (9) are considered “invalid” and no further calculations are performed using them. Furthermore, if all three features (TI2, TI10, and Ymean) fall below the thresholds in (8) or (9), then the video clip is considered “still” and no further temporal registrat
28、ion calculations are performed. Feature sequence correlation is performed on the feature streams that pass the above test. The temporal registration uncertainty, U, will be specified in fields for interlaced systems and frames for progressive systems. U represents the maximum temporal shift (plus or
29、 minus) of the processed feature stream with respect to the original feature stream. The feature sequence correlation is performed as follows: 1. Given a sequence of M processed video features M1p )(a tt , we first discard the first and last U samples to form sequence U-1-Mp )(a Utt. Normalize (divi
30、de) each element in the resulting sequence by the standard deviation of that sequence to form the normalized sequence U1MUp )(n tt . 2. For each alignment delay guess d (for all -U d U), we compute a corresponding original feature stream dU1MdUo )(att , and normalize (divide) each element in the seq
31、uence by the standard deviation of that sequence to form the normalized sequence U1MUo ),(n ttd . This original feature normalization is essentially the same as the processed feature normalization in step one, except computed for each delay, d. 3. Take the resulting normalized processed and original
32、 feature streams and compute the difference between those sequences: U1Mpo )(n - ),(n ),Diff(Utttdtd (10) 4. Compute the sample standard deviation over time of the difference sequence for each delay offset d, namely ATIS-0100017.2008 8 S(d) = stdtime(Diff(d,t) (1) 5. The minimum S(d) (denoted Smin)
33、and its offset dminis the best alignment indicated by this feature. Figure 2 gives an example plot of the correlation function S(d). In this plot, the best alignment occurs for the delay d = 0 fields (i.e., dmin= 0 fields). Figure 2-Example plot of correlation function S(d). 6. If the normalized ori
34、ginal and processed feature streams were identical, then they would cancel at correct alignment (i.e., Sminwould be 0.0). On the other hand, Smincan be at most 2.0, since the normalized original and processed waveforms each have a variance of 1.0. If the normalized original and processed waveforms a
35、re independent, their variances will add and Sminwill be approximately equal to sqrt(2) 1.414. A value for Smingreater than sqrt(2) indicates that the two waveforms are negatively correlated. We have found that if Smin CORRELATION_VALID = 0.25, then the correlation between the processed feature stre
36、am and the original feature stream is probably reliable and these features are therefore considered valid. However, if Smin CORRELATION_INVALID = 1.40, then the correlation between the processed feature stream and the original feature stream is unreliable for the above reasons and these features are
37、 therefore considered invalid. For correlations in between (e.g., CORRELATION_VALID Smin CORRELATION_INVALID), Sminyields ambiguous results with respect to accuracy, so other criteria must be used (see step 7). 7. If (CORRELATION_VALID Smin CORRELATION_INVALID), find the earliest (minimum) delay, d1
38、, where S(d1) Smin+ DELTA_THRESHOLD, where DELTA_THRESHOLD = 0.04. Also find the latest delay, d2, where S(d2) Smin+ DELTA_THRESHOLD. Compute the distance ATIS-0100017.2008 9 between those two delays, w = d2 d1+ 1. Notice that no restrictions are placed on values of S for delays between d1and d2. Th
39、is width w discriminates between correlations with a well-defined minimum, and correlations with multiple nearly-identical minimum values (e.g., a sharp correlation function as given in Figure 2 versus a broad correlation function). TI features with w TI_WIDTH are reliable, and Ymean features with w
40、 Y_WIDTH are reliable, where TI_WIDTH = 3 and Y_WIDTH = 4. Features that meet these criteria are considered valid whereas features that do not meet these criteria are considered invalid. Table 2 provides a summary of the recommended threshold values that are used by the temporal registration algorit
41、hm. These recommended threshold values were obtained by minimizing alignment errors between the current algorithm and the frame based temporal alignment method in section 3.4.2 of 3. Table 2 - Recommended Values for Thresholds Used by Temporal Registration Algorithm Threshold Recommended Value TI_TH
42、RESHOLD 0.15 Y_THRESHOLD 0.25 CORRELATION_VALID 0.25 CORRELATION_INVALID 1.40 DELTA_THRESHOLD 0.04 TI_WIDTH 3 Y_WIDTH 4 5.4 Estimation of Temporal Offset from Features The following describes how to apply the three features of section 2.1 and the correlation algorithm of section 2.3 to achieve the f
43、inal estimate of the temporal alignment offset. 1. Compute the original and processed TI2 feature streams, TI10 feature streams, and Ymean feature streams in section 2.1. When operating in-service, transmit the original features to the processed video location. At most, single precision will be requ
44、ired (i.e., 4-byte floating point), and further bandwidth savings can be obtained through quantization (e.g., 16 bits per value). 2. For each original and processed feature stream pair, compute the correlation function S(d) according to section 2.3 and record whether that feature is valid or invalid
45、, and if invalid whether that feature is still (e.g., motionless). 3. If one or more features are valid, average those valid correlation functions together. For progressive video sequences, find the delay offset Sminthat minimizes the averaged correlation function. For interlaced video sequences, th
46、is algorithm may be run with either field or frame accuracy. For frame accurate delay, restrict the search to either field one delays or field two delays; for field accurate delay, include both field one and field two delays in the search. Find the delay offset Sminthat minimizes the averaged correl
47、ation function. ATIS-0100017.2008 10 Note: This algorithm cannot be used to determine with 100% reliability whether field one of the processed video sequence best aligns to field one or field two of the original sequence (i.e., indicative of interlaced reframing by the video system under test see se
48、ction 3.1.2 of 3). We have found that such a determination is approximately 90% accurate. Thus, to detect reframing we recommend the use of some other, external algorithm such as the one that will be described in section 3. Field accurate delays from this temporal registration algorithm provide roug
49、h estimates, suitable when subsequent steps will improve those estimates. 4. If all of the features are invalid, a delay cannot be computed for this video clip. Furthermore, if all features have been marked still, then the clip contains still video and this extra information may be reported to the user. Temporal alignment or registration is not required to estimate quality for still scenes. 5.5 Observations and Concl