SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf

上传人:fatcommittee260 文档编号:1046699 上传时间:2019-03-27 格式:PDF 页数:23 大小:498.37KB
下载 相关 举报
SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf_第1页
第1页 / 共23页
SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf_第2页
第2页 / 共23页
SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf_第3页
第3页 / 共23页
SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf_第4页
第4页 / 共23页
SMPTE ST 2064-1-2015 Audio to Video Synchronization Measurement - Fingerprint Generation.pdf_第5页
第5页 / 共23页
点击查看更多>>
资源描述

1、 Copyright 2015 by THE SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 3 Barker Avenue, White Plains, NY 10601 (914) 761-1100 Approved October 9, 2015 . Table of Contents Page Foreword . 2 Intellectual Property 2 Introduction 2 1 Scope . 4 2 Conformance Notation . 4 3 Normative References . 4 4 D

2、efinitions and Terminology. 5 5 Fingerprint Generation . 5 5.1 Audio and Video Fingerprint Generation 5 5.2 Video Fingerprint Generation . 6 5.3 Audio Fingerprint Generation . 10 6 Encapsulation of Fingerprints 14 6.1 Container Structure 15 6.2 ID Sub-container Structure 16 6.3 Video Fingerprint Sub

3、-container Structure. 17 6.4 Audio Fingerprint Sub-container Structure. 17 7 Sample Fingerprint Transport Packets (Informative) . 21 Annex A Bibliography (Informative) 23 Page 1 of 23 pages SMPTE ST 2064-1:2015 SMPTE STANDARD Audio to Video Synchronization Measurement Fingerprint Generation SMPTE ST

4、 2064-1:2015 Page 2 of 23 pages Foreword The Society of Motion Picture and Television Engineers (SMPTE) is an internationally-recognized standards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTEs

5、 Engineering Documents, including Standards, Recommended Practices, and Engineering Guidelines, are prepared by SMPTEs Technology Committees. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizatio

6、ns, including ISO, IEC and ITU. SMPTE Engineering Documents are drafted in accordance with the rules given in the Standards Operations Manual. SMPTE ST 2064-1 was prepared by Technology Committee 24TB. Intellectual Property SMPTE draws attention to the fact that it is claimed that compliance with th

7、is Standard may involve the use of one or more patents or other intellectual property rights (collectively, “IPR“). The Society takes no position concerning the evidence, validity, or scope of this IPR. Each holder of claimed IPR has assured the Society that it is willing to License all IPR it owns,

8、 and any third party IPR it has the right to sublicense, that is essential to the implementation of this Standard to those (Members and non-Members alike) desiring to implement this Standard under reasonable terms and conditions, demonstrably free of discrimination. Each holder of claimed IPR has fi

9、led a statement to such effect with SMPTE. Information may be obtained from the Director, Standards or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated but not prohibited. The keywords “m

10、ay“ and “need not“ indicate courses of action permissible within the limits of the document. The keyword “reserved” indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword “forbidden” indicates “reserved” and in addition indicates that

11、 the provision will never be defined in the future. A conformant implementation according to this document is one that includes all mandatory provisions (“shall“) and, if implemented, all recommended provisions (“should“) as described. A conformant implementation need not implement optional provisio

12、ns (“may“) and need not implement them as described. Unless otherwise specified, the order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; followed by formal languages; then figures

13、; and then any other language forms. 3 Normative References Note: All references in this document to other SMPTE documents use the current numbering style (e.g. SMPTE ST 274:2008) although, during a transitional phase, the document as published (printed or PDF) may bear an older designation (such as

14、 SMPTE 274M-2008). Documents with the same root number (e.g. 274) and publication year (e.g. 2008) are functionally identical. The following standards contain provisions which, through reference in this text, constitute provisions of this Standard. At the time of publication, the editions indicated

15、were valid. All standards are subject to revision, and parties to agreements based on this Standard are encouraged to investigate the possibility of applying the most recent edition of the standards indicated below. SMPTE ST 125:2013, SDTV Component Video Signal Coding 4:4:4 and 4:2:2 for 13.5 MHz a

16、nd 18 MHz Systems SMPTE ST 2064-1:2015 Page 5 of 23 pages SMPTE ST 274:2008, Television 1920 1080 Image Sample Structure, Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates SMPTE ST 296:2012, 1280 720 Progressive Image 4:2:2 and 4:4:4 Sample Structure Analog and

17、 Digital Representation and Analog Interface SMPTE ST 352:2013, Payload Identification Codes for Serial Digital Interfaces SMPTE ST 2036-1:2014, Ultra High Definition Television Image Parameter Values for Program Production SMPTE ST 2048-1:2011, 2048 1080 and 4096 2160 Digital Cinematography Product

18、ion Image Formats FS/709 Recommendation ITU-R-BS.775-3 (08/2012), Multichannel Stereophonic Sound System with and without Accompanying Picture 4 Definitions and Terminology 4.1 Fingerprints The term “fingerprint”, as applied to audio and video signals, refers generally to a computed representation o

19、f key features of the image and sound being transported in the signals, with the computation specified so that the representation uniquely identifies the contents. To be practical, the fingerprint must be many orders of magnitude smaller than the signal it represents, while at the same time maintain

20、ing a very high probability of unique identification. Being content oriented, it must be resistant to many of the processes that the content may undergo, such as scaling and format conversion. For the purposes of this document, the term “Video Fingerprint” refers to the values computed according to

21、the processes described in Section 5.2, Video Fingerprint Generation. For the purposes of this document, the term “Audio Fingerprint” refers to the values computed according to the processes described in Section 5.3, Audio Fingerprint Generation. 4.2 Reserved Bits 5 Fingerprint Generation 5.1 Audio

22、and Video Fingerprint Generation This section defines the method of generating audio and video fingerprints of an audio and/or video essence or signal. These fingerprints can be generated at different points in the signal chain in order to enable measurement of the Audio to Video (A/V) lip-sync erro

23、r at one or more points. An example of the fingerprint generation and analysis is shown in Figure 1. General signal processing for fingerprint generation comprises multiple functions appropriate for video and audio, respectively. A reference function for each is described in the next sections. Other

24、 methods that achieve identical results at the points of interchange may be used. SMPTE ST 2064-1:2015 Page 6 of 23 pages F i n g e r p r i n t G e n e r a t i o nF i n g e r p r i n t A n a l y s i sF i n g e r p r i n t G e n e r a t i o nV1A1V2A2V F1A F1V F2A F2V DA DL SV F1= V i d e o F i n g e

25、r p r i n t 1A F1= A u d i o F i n g e r p r i n t 1V D = V i d e o D e l a yA D = A u d i o D e l a yL i p - S y n cFigure 1 Example of Lip-Sync Error Measurement System Note: The method for fingerprint analysis is not defined in this document and can vary depending on the implementation. 5.2 Video

26、 Fingerprint Generation Video fingerprint generation shall be based on evaluation of the amount of video content change (typically due to motion) during the time interval between two video frames or fields. Fields (for interlaced video) or frames (for progressive video) shall be processed sequential

27、ly. The current field/frame shall be compared with the second preceding field/frame to calculate a difference used for further processing. Only the 8 most significant bits of the luminance samples shall be used in the calculation. General video signal processing for fingerprint generation shall comp

28、rise three major functions prefiltering, windowing and sub-sampling, and motion detection as illustrated in Figure 2. These functions are described in the following subsections. ViM o t i o nD e t e c t i o nV Fi WiP r e f i l t e rFiW i n d o w i n g a n d S u b - s a m p l i n g Figure 2 Video Fin

29、gerprint Generation SMPTE ST 2064-1:2015 Page 7 of 23 pages 5.2.1 Prefilter Prior to sampling, the video shall be filtered to reduce its bandwidth and to facilitate consistent results with different video formats. To reduce the implementation resources required, a simple mean calculation shall be us

30、ed only on the horizontal axis and shall be applied only to the luminance samples. Table 1 shows the filter used for different video formats. Table 1 Video Format Prefilter Video Format Video Standard Prefilter Used 4096 X 2160p ST 2048-1 1 1 1 1 1 1 / 6 3840 X 2160p ST 2036-1 1 1 1 1 1 1 / 6 2048 X

31、 1080p ST 2048-1 1 1 1 / 3 1080i, 1080p ST 274 1 1 1 / 3 720p ST 296 1 1 0 / 2 SD 525, SD 625 ST 125 0 1 0 / 1 For 2160-line formats, the filter shall use the three previous, current and two next pixels. For 1080-line formats, the filter shall use the previous, current and next pixels. For 720p form

32、ats the previous and current pixels shall be used. For SD formats, no filtering shall be done. Note: This filtering method results in an image that is not optimized for viewing but is suitable for fingerprint generation. The defined reduction in spatial resolution does not reduce the number of pixel

33、s in the image; such a reduction occurs in the windowing process. 5.2.2 Windowing and Sub-Sampling When comparing fingerprints derived from two video signals, originating from the same picture content, it is possible that one signal has been altered compared to the other. Possible alterations could

34、include such things as branding, graphic overlays and aspect ratio change. These alterations will decrease the effectiveness of signal matching. Accordingly, a window is defined to focus the fingerprint generation on the central area of the image and reduce the impact of such possible alterations to

35、 the video. The number of pixels in the window also is reduced by a sub-sampling process. It is recognized that the effectiveness of the window measurement may vary depending on the nature of the signals to be compared. A windowing block shall be used to select a part of the image from which the vid

36、eo fingerprint is extracted. The pixel coordinates of the window depend on the video format of the signal in use at the time of fingerprint generation and shall be as shown in Table 2. A subset of pixels inside the defined window that is derived by sub-sampling shall be used for fingerprint generati

37、on, and pixels outside the window shall be ignored. The subset of pixels shall consist of 16 sample rows of 60 pixel samples, evenly spaced horizontally and vertically across the selected window. This yields a total of 960 pixels that are used in the motion detection process for fingerprint generati

38、on. SMPTE ST 2064-1:2015 Page 8 of 23 pages The process of determining the pixels to be used is illustrated in the following example for 720p, with a window sampled from a picture of 1280 pixels by 720 lines. Figure 3 shows an example of how to determine the lines and selected pixels to be compared:

39、 According to Table 2, the first horizontal pixel position to use is pixel 256. Since the pixel step is 13, the next pixels will be 269, 282, up to pixel 102 3, which is the 60th selected pixel on the line. Vertically, the first row of pixels to compare will be on line 117. Since the line step is 32

40、, the next lines will be 149, 181, up to line 597, which is the 16th selected lines. This is depicted in Figure 3. 2 5 6 2 6 9 2 8 2. . .1 0 2 31 1 71 4 91 8 1. . .5 9 7P i x e l sL i n e sP i x e l 1 0 2 3o n l i n e 5 9 70 1 2 7 917 5 01 332Figure 3 Pixels used for comparing video frames in 720p T

41、able 2 Window Coordinates per Video Format Video Format Window HStart HStep HStop VStart (f1) VStart (f2) VStep VStop (f1) VStop (f2) 720 X 485i 123 8 595 60 323 10 210 473 720 X 576i 123 8 595 68 381 12 248 561 1280 X 720p 256 13 1023 117 32 597 1920 X 1080i 399 19 1520 89 652 24 449 1012 1920 X 10

42、80p 399 19 1520 178 48 898 3840 X 2160p 798 38 3040 412 92 1792 2048 X 1080p 463 19 1584 206 46 896 4096 X 2160p 926 38 3168 412 92 1792 Note: Values for field 1 (f1) and field 2 (f2) are shown separately. SMPTE ST 2064-1:2015 Page 9 of 23 pages 5.2.3 Motion Detection The motion detection block shal

43、l calculate the amount of change in the video content between the current field/frame and the content of a prior field/frame. The difference in the video content between the current and a prior video field/frame shall be used to calculate a video fingerprint. The motion detection process, illustrate

44、d in Figure 4, shall compare pixels within the current field/frame Ck with pixels within a prior field/frame Pk to determine the amount of change between them. Only the 8 most significant bits of the luminance samples shall be used in the calculation. .F r a m e 1f i e l d 1# 1.F r a m e 2f i e l d

45、1# 2P k C kabs ( P k - C k )k = 1N1 ( i f = 32 )0 o t h e r w i se( )4VS i ( f ) = Figure 4 Formula showing how the pixels are compared and counted 5.2.3.1 Pixel Compare In order to make fingerprints compatible between interlaced and progressive forms of the same content, the following comparisons a

46、re made: For progressive video, comparison shall be made between the current frame and the frame two frames before the current frame. For example: Assuming a sequence of frame F1 F2 F3 F4 F5. If F4 is the current frame, the pixel comparison would be with frame F2. If F5 is the current frame, the pix

47、el comparison would be with frame F3. For interlaced video, comparison shall be made between the current field and the same field from the immediately preceding frame. For example: Assuming a sequence of fields: f1 f2 f3 f4 f5. If f4 is the current field, the pixel comparison would be with field f2.

48、 If f5 is the current field, the pixel comparison would be with field f3. SMPTE ST 2064-1:2015 Page 10 of 23 pages F 1 F 2 F 3P r o g r e s s i v eF r a m e 1F r a m e 2F r a m e 3F 4 F 5 F 6f 1 f 2 f 3 f 4 f 5 f 6I n t e r l a c e dC o m p a r e F 4 - F 2C o m p a r e f 5 - f 4C o m p a r e f 4 - f

49、 2C o m p a r e F 5 - F 3Figure 5 Inter-Frame/Field Comparisons in Progressive and Interlaced Video 5.2.3.2 Pixel Counting Pixel counting shall be done to establish the number of pixels, within the 960 pixels being monitored, that have changed between fields for interlaced video or between frames for progressive video. As shown in Figure 4, a pixel shall be considered changed if the difference between the current pixel and the same pixel in the previous

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 标准规范 > 国际标准 > 其他

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1